[petsc-dev] Fwd: related to compiling your source code

Mark Adams mfadams at lbl.gov
Tue Apr 14 07:41:31 CDT 2015


PETSc's design of looking for RC files in the users home directory really
sucks.  I've complained about this before a few years ago and am going to
again.

A perfectly reasonable apps person had a .petscrc file in his home
directory with at "%-pc_type_hypre" in it.  This gave an error but he could
not figure out where PETSc got this thing.  (The error message was garbled
for some reason, which slowed things down. As soon as I saw
"%-pc_type_hypre" I knew what the problem was.)  As you can see below he
spent a day doing this.

I spent a day with another apps person on this same project a few years ago
with this same problem.  This is an error prone construct and it does not
show up until you have used PETSc for a few years and have forgotten that
you have a .petscrc file in your home directory. Very bad.

Also, this code specified that RC file name as "petsc.rc".  It looks like
PETSc is still picking up a .petscrc file anyway!!!  At the very least we
should scrub ".petscrc" if the user supplies another name.

I think we should dump this design and stop looking at home directories and
just tell users to change, as we always do when we improve the design.

Thanks,
Mark


---------- Forwarded message ----------
From: Yoon, Eisung <yoone at rpi.edu>
Date: Mon, Apr 13, 2015 at 11:00 PM
Subject: RE: related to compiling your source code
To: Mark Adams <mfadams at lbl.gov>, Robert Hager <rhager at pppl.gov>
Cc: Choong-Seock Chang <cschang at pppl.gov>, Mark Shephard <shephard at rpi.edu>,
Seung-Hoe Ku <sku at pppl.gov>


--------
WARNING: At least one of the links in the message below goes to an IP
address (e.g.10.1.1.1), which could be malicious. To learn how to protect
yourself, please go here: https://commons.lbl.gov/x/_591B
--------
 Dear Mark and Robert,

 Wow! Thank you so much for all your comments and helps. After I spent a
whole day, I am about to give up resolving this issue.

 I confirmed that the file which has %-pc_type hypre is located at my home
directory! But the file name caused the problem is not petsc.rc but
.petscrc, which I guess I copied a long time ago. The source code directory
where the XGC execution file is located has a petsc.rc which does not
contain %-pc_type hypre and does not have .petscrc file in the directory.

 In summary, three directories were involved for running XGC; A source code
directory where the XGC execution file is located, working directory where
job is submitted, and just my home directory. And the problem was from my
home directory.

 In addition, it is quite interesting that petscinitialize subroutine
passes "./petsc.rc" in the XGC source code, while petsc tried to find a
DEFAULT file ".petscrc"  in my HOME directory!!!

 I currently have removed .petscrc and submitted the job to see if XGC
runs.

 Best,
Eisung Yoon

 ------------------------------
*From:* Mark Adams [mfadams at lbl.gov]
*Sent:* Monday, April 13, 2015 10:19 PM
*To:* Robert Hager
*Cc:* Yoon, Eisung; Choong-Seock Chang; Mark Shephard; Seung-Hoe Ku

*Subject:* Re: related to compiling your source code

   Good try Robert :)

 I'll bet Eisung has a petsc.rc file in his home directory.  Let me know.
I will use this as another data point to support my opinion that looking in
your home directory is a bad idea.

 BTW, Seung-Hoe (cc'ed) and I had this same problem a few years ago and it
took us hours to figure it out,

 Mark

On Mon, Apr 13, 2015 at 9:21 PM, Robert Hager <rhager at pppl.gov> wrote:

>   It seems Petsc is looking at a certain directory, but cannot check
> where it is.
>
>
>  This may be a clue. I always copy the executable to my run directory and
> call something like
>
>  aprun ...  ./xgca
>
>  In one of your earlier e-mails, I saw that you call
>
>  aprun ... {PATH_TO_XGCa_SOURCE}/xgca
>
>  If PETSc looks for petsc.rc in the directory of the executable, it will
> try to read a very old petsc.rc file that certainly does not work. Could
> you try with the executable to your run directory?
>
>  Best
>
>  Robert
>
>   On Apr 13, 2015, at 8:58 PM, Yoon, Eisung wrote:
>
>     I attach the requested files.
>
>  I tried PETSc and petsc.rc file in the XGC1 example suggested by Mark as
> well as the original input files in xgc_chang-hinton_test.tar. Also I
> checked language options which are same with yours and tried sed command,
> but all failed with almost same messages.
>
>  There were rarely interesting error messages, which can be a clue to
> resolve this issue. The error messages showed "Unknown statement in options
> file: (%-pc_type hypre )" even though my petsc.rc doesn't have that line. I
> checked petsc.rc files in XGC source file directory as well as
> working(running) directory, but that line doesn't exist. Also the default
> .petscrc doesn't exist in both directories. It seems Petsc is looking at a
> certain directory, but cannot check where it is.
>
>  Best,
> Eisung Yoon
>
>
>
>  ------------------------------
> *From:* Robert Hager [rhager at pppl.gov]
> *Sent:* Monday, April 13, 2015 5:31 PM
> *To:* Yoon, Eisung
> *Cc:* Choong-Seock Chang; Mark Adams; Mark Shephard
> *Subject:* Re: related to compiling your source code
>
>  That looks ok.
>
>  I unpacked the tar-file I gave you and ran a diff with the petsc.rc that
> is still working for me and found that they are identical.
>
>  Did you edit any of the files (possibly in a Microsoft environment)? Or
> maybe your shell misinterprets characters. Did you specify any language in
> your shell setup?
>
>  In case something added any control characters to the petsc.rc file, you
> can run
>
>   sed -e 's/[^[:print:]]//g'
>
>  to remove them.
>
>  My language settings are
>
>   *rhager at edison02:~/w/xgca_chang-hinton_test3> locale*
>  LANG=
>  LC_CTYPE="POSIX"
>  LC_NUMERIC="POSIX"
>  LC_TIME="POSIX"
>  LC_COLLATE="POSIX"
>  LC_MONETARY="POSIX"
>  LC_MESSAGES="POSIX"
>  LC_PAPER="POSIX"
>  LC_NAME="POSIX"
>  LC_ADDRESS="POSIX"
>  LC_TELEPHONE="POSIX"
>  LC_MEASUREMENT="POSIX"
>  LC_IDENTIFICATION="POSIX"
>  LC_ALL=
>
>
>   Could you send your makefile, defs.mk and rules.mk (possibly
> rules_edison.mk) anyway, please?
>
>  Best regards
>
>  Robert
>
>
>
>  On Apr 13, 2015, at 5:01 PM, Yoon, Eisung wrote:
>
>   Hi Robert,
>
>  I added below to .cshrc.ext  as you recommended
>
>    module load cray-petsc
>   module load cray-hdf5-parallel
>   module load pspline
>   module load adios/1.6.0
>
>  and got
>
>  Currently Loaded Modulefiles:
>   1) modules/3.2.10.2                       7) intel/15.0.1.133
>            13) gni-headers/3.0-1.0502.9684.5.2.ari   19)
> PrgEnv-intel/5.2.40                   25) altd/2.0
>      31) adios/1.6.0
>   2) nsg/1.2.0                              8) cray-libsci/13.0.1
>            14) xpmem/0.1-2.0502.55507.3.2.ari        20) craype-ivybridge
>                    26) darshan/2.3.0
>   3) eswrap/1.1.0-1.020200.1130.0           9)
> udreg/2.3.2-1.0502.9275.1.12.ari      15)
> dvs/2.5_0.9.0-1.0502.1873.1.145.ari   21) cray-shmem/7.1.1
>      27) usg-default-modules/1.1
>   4) switch/1.0-1.0502.54233.2.96.ari      10)
> ugni/5.0-1.0502.9685.4.24.ari         16) alps/5.2.1-2.0502.9041.11.6.ari
>     22) cray-mpich/7.1.1                      28) cray-petsc/3.5.2.1
>   5) craype-network-aries                  11)
> pmi/5.0.6-1.0000.10439.140.2.ari      17) rca/1.0.0-2.0502.53711.3.127.ari
>      23) torque/5.0.1                          29) cray-hdf5-parallel/1.8.13
>   6) craype/2.2.1                          12)
> dmapp/7.0.1-1.0502.9501.5.219.ari     18) atp/1.7.5
>     24) moab/8.0.1-2014110616-5c7a394-sles11  30) pspline/nersc1.0
>
>  I copied Makefile.edison to Makefile, and had no problem with compiling
> and linking. I will try to figure out with the petsc.rc file. Thank you!
>
>  Best,
> Eisung Yoon
>  ------------------------------
> *From:* Robert Hager [rhager at pppl.gov]
> *Sent:* Monday, April 13, 2015 4:55 PM
> *To:* Choong-Seock Chang
> *Cc:* Yoon, Eisung; Mark Adams; Mark Shephard
> *Subject:* Re: related to compiling your source code
>
>  Hi Eisung,
>
>  I used this file with XGCa on Edison today. Which modules do you use and
> which set of makefiles?
>
>  Best
>
>  Robert
>
>  On Apr 13, 2015, at 4:44 PM, Choong-Seock Chang wrote:
>
>  Please include Mark Adams in the PETSc related e-mails.
> He is in charge of PETSc in our project.  He needs to be aware of all the
> conversations.
> Thanks,
> CS
>
>  On Apr 13, 2015, at 4:42 PM, Yoon, Eisung <yoone at rpi.edu> wrote:
>
>  Hi Robert,
>
>  I tried to run XGC in Greene and Edison. Green still has a problem with
> PETSc. Even in edison, XGCa shows an error related to the petsc.rc file as
> below. Considering "invalid argument" in the message, I guess the petsc.rc
> included in the xgca_chang-hinton_test.tar doesn't work. Unfortunately, the
> characters for the unknown option shown in the message is broken. Do you
> have working petsc.rc?
>
>  Thank you!
> ES
>
>  (t_initf) Read in prof_inparam namelist from: input
>  PERF_SETOPTS: PAPI library not linked in. Request to enable PAPI ignored.
>  (t_initf) Using profile_disable= F  profile_timer=           2
>  (t_initf)  profile_depth_limit=       99999  profile_detail_limit=
>     1
>  (t_initf)  profile_barrier= F  profile_outpe_num=           1
>  (t_initf)  profile_outpe_stride=           1  profile_single_file= F
>  (t_initf)  profile_global_stats= T  profile_papi_enable= F
>  call petsc_init
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Unknown statement in options file: (�~A'^D)
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
> [0]PETSC ERROR: /global/u2/e/eyoon/branch/dev_rhager_esyoon/epsi/XGCa/xgca
> on a sandybridge named nid05677 by eyoon Mon Apr 13 13:31:32 2015
> [0]PETSC ERROR: Configure options --known-mpi-int64_t=0
> --known-bits-per-byte=8 --known-level1-dcache-assoc=0
> --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768
> --known-memcmp-ok=1 --known-mpi-c-double-complex=1
> --known-mpi-long-double=1 --known-mpi-shared-libraries=0
> --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1
> --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4
> --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2
> --known-sizeof-size_t=8 --known-sizeof-void-p=8 --with-ar=ar --with-batch=1
> --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC
> --with-cxxlib-autodetect=0 --with-debugging=0 --with-dependencies=0
> --with-fc=ftn --with-fortran-datatypes=0 --with-fortran-interfaces=0
> --with-fortranlib-autodetect=0 --with-ranlib=ranlib --with-scalar-type=real
> --with-shared-ld=ar --with-etags=0 --with-dependencies=0
> --with-dependencies=0
> --with-mpi-dir=/opt/cray/mpt/7.0.0/gni/mpich2-intel/140 --with-superlu=1
> --with-superlu-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include
> --with-superlu-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libsuperlu.a
> --with-superlu_dist=1
> --with-superlu_dist-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include
> --with-superlu_dist-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libsuperlu_dist.a
> --with-parmetis=1
> --with-parmetis-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include
> --with-parmetis-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libparmetis.a
> --with-metis=1
> --with-metis-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include
> --with-metis-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libmetis.a
> --with-ptscotch=1
> --with-ptscotch-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include
> --with-ptscotch-lib="-L/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib
> -lptscotch -lscotch -lptscotcherr -lscotcherr" --with-scalapack=1
> --with-scalapack-include=/opt/cray/libsci/13.0.0/INTEL/140/sandybridge/include
> --with-scalapack-lib="-L/opt/cray/libsci/13.0.0/INTEL/140/sandybridge/lib
> -lsci_intel_mpi_mp -lsci_intel_mp" --with-mumps=1
> --with-mumps-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include
> --with-mumps-lib="-L/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib -lcmumps
> -ldmumps -lesmumps -lsmumps -lzmumps -lmumps_common -lptesmumps -lpord"
> --CFLAGS="-xavx -openmp -O3 " --CXXFLAGS="-xavx -openmp -O3  "
> --FFLAGS="-xavx -openmp -O3  " --LIBS=-lstdc++ --CXX_LINKER_FLAGS=
> --PETSC_ARCH=sandybridge --prefix=/opt/cray/petsc/
> 3.5.2.1/real/INTEL/140/sandybridge --with-hypre=1
> --with-hypre-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include
> --with-hypre-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libHYPRE.a
> --with-sundials=1
> --with-sundials-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include
> --with-sundials-lib="-L/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib
> -lsundials_cvode -lsundials_cvodes -lsundials_ida -lsundials_idas
> -lsundials_kinsol -lsundials_nvecparallel -lsundials_nvecserial"
> [0]PETSC ERROR: #1 PetscOptionsInsertFile() line 534 in
> /b/cray-petsc/.cray-build/INTEL/140/sandybridge/cray-petsc-base-dynamic/petsc-3.5.2/src/sys/objects/options.c
> [0]PETSC ERROR: #2 PetscOptionsInsert() line 716 in
> /b/cray-petsc/.cray-build/INTEL/140/sandybridge/cray-petsc-base-dynamic/petsc-3.5.2/src/sys/objects/options.c
> [0]PETSC ERROR: PetscInitialize:Creating options database
> PETSC ERROR: Logging has not been enabled.
> You might have forgotten to call PetscInitialize().
> Rank 0 [Mon Apr 13 13:31:32 2015] [c5-3c1s11n1] application called
> MPI_Abort(MPI_COMM_WORLD, 56) - process 0
> forrtl: error (76): Abort trap signal
> Image              PC                Routine            Line        Source
> xgca               0000000003363F21  Unknown               Unknown  Unknown
> xgca               0000000003362677  Unknown               Unknown  Unknown
> xgca               000000000331A2F4  Unknown               Unknown  Unknown
> xgca               000000000331A106  Unknown               Unknown  Unknown
> xgca               00000000032AE434  Unknown               Unknown  Unknown
> xgca               00000000032B53B1  Unknown               Unknown  Unknown
> xgca               0000000002F64B60  Unknown               Unknown  Unknown
> xgca               0000000002F64B1B  Unknown               Unknown  Unknown
> xgca               0000000003371B11  Unknown               Unknown  Unknown
> xgca               0000000003131922  Unknown               Unknown  Unknown
> xgca               0000000003100063  Unknown               Unknown  Unknown
> xgca               00000000008BD7F0  Unknown               Unknown  Unknown
> xgca               00000000008B2241  Unknown               Unknown  Unknown
> xgca               00000000008C1B41  Unknown               Unknown  Unknown
> xgca               000000000042554B  perf_monitor_mp_p        1875
>  module.F90
> xgca               000000000051F3BE  MAIN__                     95
>  main.F90
> xgca               0000000000405DEE  Unknown               Unknown  Unknown
> xgca               000000000336B6C1  Unknown               Unknown  Unknown
> xgca               0000000000405CD1  Unknown               Unknown  Unknown
> _pmiu_daemon(SIGCHLD): [NID 05677] [c5-3c1s11n1] [Mon Apr 13 13:31:32
> 2015] PE RANK 0 exit signal Aborted
> [NID 05677] 2015-04-13 13:31:32 Apid 11750871: initiated application
> termination
> Application 11750871 exit codes: 134
> Application 11750871 exit signals: Killed
> Application 11750871 resources: utime ~60s, stime ~12s, Rss ~29844,
> inblocks ~3174405, outblocks ~8270892
>  ------------------------------
> *From:* Robert Hager [rhager at pppl.gov]
> *Sent:* Monday, April 13, 2015 2:16 PM
> *To:* Yoon, Eisung
> *Cc:* shephard at rpi.edu; cschang at pppl.gov
> *Subject:* Re: related to compiling your source code
>
>  Hi Eisung,
>
>  you can use the input in
>
>  /project/projectdirs/m499/rhager/xgca_chang-hinton_test.tar
>
>  Let me know if you have trouble reading the file.
>
>  Best regards
>
>  Robert
>
>  On Apr 13, 2015, at 1:46 PM, Yoon, Eisung wrote:
>
>  Hi Robert,
>
>  Thank you for the information and explanation. I attach a text file
> which contains issues of source code with TRIGRID and variable collision
> time.
>
>  I'm sorry for not telling you previously that I was compiling the source
> code in PPPL server. I've not ready to use XGC  in Edison yet but I'm going
> to work it to be ready right now.
>
>  Could you send me an input file of XGCa for a collision test in Edison?
>
>  Thanks a lot!!!
> ES
>
>
>  ------------------------------
> *From:* Robert Hager [rhager at pppl.gov]
> *Sent:* Monday, April 13, 2015 10:34 AM
> *To:* Yoon, Eisung
> *Cc:* shephard at rpi.edu; cschang at pppl.gov
> *Subject:* Re: related to compiling your source code
>
>  Hi Eisung,
>
>  the TRIGRID directive should not cause any errors. Can I see the error
> message?
>
>  I looked at Makefile.edison in your branch. It looks fine. You might
> have to change defs.mk though. There is one include statement to import
> some PETSc variable definitions. Depending on whether you use PETSc 3.5 or
> 3.6, you have to use the first or the second line, respectively.
>
>  On Edison, I load the following modules in addition to the default:
>
>     module load cray-petsc
>    module load cray-hdf5-parallel
>    module load pspline
>
>  The output of module list is
>
>  Currently Loaded Modulefiles:
>   1) modules/3.2.10.2                      13)
> gni-headers/3.0-1.0502.9684.5.2.ari   25) cray-petsc/3.5.2.1
>   2) nsg/1.2.0                             14)
> xpmem/0.1-2.0502.55507.3.2.ari        26) cray-hdf5-parallel/1.8.13
>   3) eswrap/1.1.0-1.020200.1130.0          15)
> dvs/2.5_0.9.0-1.0502.1873.1.145.ari   27) pspline/nersc1.0
>   4) switch/1.0-1.0502.54233.2.96.ari      16)
> alps/5.2.1-2.0502.9041.11.6.ari       28) allineatools/5.0.1
>   5) craype-network-aries                  17)
> rca/1.0.0-2.0502.53711.3.127.ari      29) idl/8.2
>   6) craype/2.2.1                          18) atp/1.7.5
>           30) gv/3.7.3
>   7) intel/15.0.1.133                      19) PrgEnv-intel/5.2.40
>             31) latex/2012
>   8) cray-libsci/13.0.1                    20) craype-ivybridge
>            32) altd/2.0
>   9) udreg/2.3.2-1.0502.9275.1.12.ari      21) cray-shmem/7.1.1
>            33) darshan/2.3.0
>  10) ugni/5.0-1.0502.9685.4.24.ari         22) cray-mpich/7.1.1
>            34) usg-default-modules/1.1
>  11) pmi/5.0.6-1.0000.10439.140.2.ari      23) torque/5.0.1
>  12) dmapp/7.0.1-1.0502.9501.5.219.ari     24)
> moab/8.0.1-2014110616-5c7a394-sles11
>
>
>  Last time I tried, the code compiled with these settings. It also ran a
> couple of time steps. But there are still some bugs in the code. Making the
> collision time step variable is a bit complicated because the collision
> operation is usually run together with all other sources like heating, etc.
> Therefore, the distribution function is evaluated only every
> sml_f_source_period time steps. If a collision operation is supposed to run
> at a different time step, f will not be available with the current code.
> However, in order to test whether it is worth to pursue this approach, I
> wanted to implement variable collision time steps in the simplest possible
> way, i.e. sml_f_source_period=0 and all sources except the collision
> operation deactivated. The collision interval must have an upper limit
> which I set to 10 time steps in my test. The interval for load-balancing
> should be a multiple of this upper limit in order to be efficient. If this
> approach helps to improve performance, we can think about how to implement
> variable collision intervals in a cleaner way.
>
>  Let me know if you have any further problems.
>
>  Best
>
>  Robert
>
>
>  On Apr 12, 2015, at 2:38 PM, Yoon, Eisung wrote:
>
>  Hi Robert,
>
>  Thank you for the performance test data. I really appreciate your work.
>
>  As for variable collision time,  I've made a branch "dev_rhager_esyoon"
> as a copy of your source code, "dev_rhager". I've read your modification
> for variable collision time in the XGCa folder.
>
>  In order to run the code, I currently have trouble with compiling the
> source code. It appears preprocessing directives -DTRIGRID causes the
> error. Could you send me your Makefile to see working compile options?
>
>  Thank you.
>
>  Best,
> ES
>
>
>
>
>
>
>    <code_reading.txt>
>
>
>      <defs.mk><Makefile><rules.mk>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20150414/ffbd854e/attachment.html>


More information about the petsc-dev mailing list