[petsc-dev] related to compiling your source code
Barry Smith
bsmith at mcs.anl.gov
Tue Apr 14 15:20:02 CDT 2015
Mark,
Ok, you convinced me.
Satish,
Can you please rip out all the checking for .petscrc in the home directory.
Thanks
Barry
Theoretically it is a great feature but no one uses it and those who do use it forget that they used it.
> On Apr 14, 2015, at 7:41 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
> PETSc's design of looking for RC files in the users home directory really sucks. I've complained about this before a few years ago and am going to again.
>
> A perfectly reasonable apps person had a .petscrc file in his home directory with at "%-pc_type_hypre" in it. This gave an error but he could not figure out where PETSc got this thing. (The error message was garbled for some reason, which slowed things down. As soon as I saw "%-pc_type_hypre" I knew what the problem was.) As you can see below he spent a day doing this.
>
> I spent a day with another apps person on this same project a few years ago with this same problem. This is an error prone construct and it does not show up until you have used PETSc for a few years and have forgotten that you have a .petscrc file in your home directory. Very bad.
>
> Also, this code specified that RC file name as "petsc.rc". It looks like PETSc is still picking up a .petscrc file anyway!!! At the very least we should scrub ".petscrc" if the user supplies another name.
>
> I think we should dump this design and stop looking at home directories and just tell users to change, as we always do when we improve the design.
>
> Thanks,
> Mark
>
>
> ---------- Forwarded message ----------
> From: Yoon, Eisung <yoone at rpi.edu>
> Date: Mon, Apr 13, 2015 at 11:00 PM
> Subject: RE: related to compiling your source code
> To: Mark Adams <mfadams at lbl.gov>, Robert Hager <rhager at pppl.gov>
> Cc: Choong-Seock Chang <cschang at pppl.gov>, Mark Shephard <shephard at rpi.edu>, Seung-Hoe Ku <sku at pppl.gov>
>
>
> --------
> WARNING: At least one of the links in the message below goes to an IP address (e.g.10.1.1.1), which could be malicious. To learn how to protect yourself, please go here: https://commons.lbl.gov/x/_591B
> --------
>
> Dear Mark and Robert,
>
> Wow! Thank you so much for all your comments and helps. After I spent a whole day, I am about to give up resolving this issue.
>
> I confirmed that the file which has %-pc_type hypre is located at my home directory! But the file name caused the problem is not petsc.rc but .petscrc, which I guess I copied a long time ago. The source code directory where the XGC execution file is located has a petsc.rc which does not contain %-pc_type hypre and does not have .petscrc file in the directory.
>
> In summary, three directories were involved for running XGC; A source code directory where the XGC execution file is located, working directory where job is submitted, and just my home directory. And the problem was from my home directory.
>
> In addition, it is quite interesting that petscinitialize subroutine passes "./petsc.rc" in the XGC source code, while petsc tried to find a DEFAULT file ".petscrc" in my HOME directory!!!
>
> I currently have removed .petscrc and submitted the job to see if XGC runs.
>
> Best,
> Eisung Yoon
>
> From: Mark Adams [mfadams at lbl.gov]
> Sent: Monday, April 13, 2015 10:19 PM
> To: Robert Hager
> Cc: Yoon, Eisung; Choong-Seock Chang; Mark Shephard; Seung-Hoe Ku
>
> Subject: Re: related to compiling your source code
>
> Good try Robert :)
>
> I'll bet Eisung has a petsc.rc file in his home directory. Let me know. I will use this as another data point to support my opinion that looking in your home directory is a bad idea.
>
> BTW, Seung-Hoe (cc'ed) and I had this same problem a few years ago and it took us hours to figure it out,
>
> Mark
>
> On Mon, Apr 13, 2015 at 9:21 PM, Robert Hager <rhager at pppl.gov> wrote:
>> It seems Petsc is looking at a certain directory, but cannot check where it is.
>
> This may be a clue. I always copy the executable to my run directory and call something like
>
> aprun ... ./xgca
>
> In one of your earlier e-mails, I saw that you call
>
> aprun ... {PATH_TO_XGCa_SOURCE}/xgca
>
> If PETSc looks for petsc.rc in the directory of the executable, it will try to read a very old petsc.rc file that certainly does not work. Could you try with the executable to your run directory?
>
> Best
>
> Robert
>
> On Apr 13, 2015, at 8:58 PM, Yoon, Eisung wrote:
>
>> I attach the requested files.
>>
>> I tried PETSc and petsc.rc file in the XGC1 example suggested by Mark as well as the original input files in xgc_chang-hinton_test.tar. Also I checked language options which are same with yours and tried sed command, but all failed with almost same messages.
>>
>> There were rarely interesting error messages, which can be a clue to resolve this issue. The error messages showed "Unknown statement in options file: (%-pc_type hypre )" even though my petsc.rc doesn't have that line. I checked petsc.rc files in XGC source file directory as well as working(running) directory, but that line doesn't exist. Also the default .petscrc doesn't exist in both directories. It seems Petsc is looking at a certain directory, but cannot check where it is.
>>
>> Best,
>> Eisung Yoon
>>
>>
>>
>> From: Robert Hager [rhager at pppl.gov]
>> Sent: Monday, April 13, 2015 5:31 PM
>> To: Yoon, Eisung
>> Cc: Choong-Seock Chang; Mark Adams; Mark Shephard
>> Subject: Re: related to compiling your source code
>>
>> That looks ok.
>>
>> I unpacked the tar-file I gave you and ran a diff with the petsc.rc that is still working for me and found that they are identical.
>>
>> Did you edit any of the files (possibly in a Microsoft environment)? Or maybe your shell misinterprets characters. Did you specify any language in your shell setup?
>>
>> In case something added any control characters to the petsc.rc file, you can run
>>
>> sed -e 's/[^[:print:]]//g'
>>
>> to remove them.
>>
>> My language settings are
>>
>> rhager at edison02:~/w/xgca_chang-hinton_test3> locale
>> LANG=
>> LC_CTYPE="POSIX"
>> LC_NUMERIC="POSIX"
>> LC_TIME="POSIX"
>> LC_COLLATE="POSIX"
>> LC_MONETARY="POSIX"
>> LC_MESSAGES="POSIX"
>> LC_PAPER="POSIX"
>> LC_NAME="POSIX"
>> LC_ADDRESS="POSIX"
>> LC_TELEPHONE="POSIX"
>> LC_MEASUREMENT="POSIX"
>> LC_IDENTIFICATION="POSIX"
>> LC_ALL=
>>
>> Could you send your makefile, defs.mk and rules.mk (possibly rules_edison.mk) anyway, please?
>>
>> Best regards
>>
>> Robert
>>
>>
>>
>> On Apr 13, 2015, at 5:01 PM, Yoon, Eisung wrote:
>>
>>> Hi Robert,
>>>
>>> I added below to .cshrc.ext as you recommended
>>>
>>> module load cray-petsc
>>> module load cray-hdf5-parallel
>>> module load pspline
>>> module load adios/1.6.0
>>>
>>> and got
>>>
>>> Currently Loaded Modulefiles:
>>> 1) modules/3.2.10.2 7) intel/15.0.1.133 13) gni-headers/3.0-1.0502.9684.5.2.ari 19) PrgEnv-intel/5.2.40 25) altd/2.0 31) adios/1.6.0
>>> 2) nsg/1.2.0 8) cray-libsci/13.0.1 14) xpmem/0.1-2.0502.55507.3.2.ari 20) craype-ivybridge 26) darshan/2.3.0
>>> 3) eswrap/1.1.0-1.020200.1130.0 9) udreg/2.3.2-1.0502.9275.1.12.ari 15) dvs/2.5_0.9.0-1.0502.1873.1.145.ari 21) cray-shmem/7.1.1 27) usg-default-modules/1.1
>>> 4) switch/1.0-1.0502.54233.2.96.ari 10) ugni/5.0-1.0502.9685.4.24.ari 16) alps/5.2.1-2.0502.9041.11.6.ari 22) cray-mpich/7.1.1 28) cray-petsc/3.5.2.1
>>> 5) craype-network-aries 11) pmi/5.0.6-1.0000.10439.140.2.ari 17) rca/1.0.0-2.0502.53711.3.127.ari 23) torque/5.0.1 29) cray-hdf5-parallel/1.8.13
>>> 6) craype/2.2.1 12) dmapp/7.0.1-1.0502.9501.5.219.ari 18) atp/1.7.5 24) moab/8.0.1-2014110616-5c7a394-sles11 30) pspline/nersc1.0
>>>
>>> I copied Makefile.edison to Makefile, and had no problem with compiling and linking. I will try to figure out with the petsc.rc file. Thank you!
>>>
>>> Best,
>>> Eisung Yoon
>>> From: Robert Hager [rhager at pppl.gov]
>>> Sent: Monday, April 13, 2015 4:55 PM
>>> To: Choong-Seock Chang
>>> Cc: Yoon, Eisung; Mark Adams; Mark Shephard
>>> Subject: Re: related to compiling your source code
>>>
>>> Hi Eisung,
>>>
>>> I used this file with XGCa on Edison today. Which modules do you use and which set of makefiles?
>>>
>>> Best
>>>
>>> Robert
>>>
>>> On Apr 13, 2015, at 4:44 PM, Choong-Seock Chang wrote:
>>>
>>>> Please include Mark Adams in the PETSc related e-mails.
>>>> He is in charge of PETSc in our project. He needs to be aware of all the conversations.
>>>> Thanks,
>>>> CS
>>>>
>>>> On Apr 13, 2015, at 4:42 PM, Yoon, Eisung <yoone at rpi.edu> wrote:
>>>>
>>>>> Hi Robert,
>>>>>
>>>>> I tried to run XGC in Greene and Edison. Green still has a problem with PETSc. Even in edison, XGCa shows an error related to the petsc.rc file as below. Considering "invalid argument" in the message, I guess the petsc.rc included in the xgca_chang-hinton_test.tar doesn't work. Unfortunately, the characters for the unknown option shown in the message is broken. Do you have working petsc.rc?
>>>>>
>>>>> Thank you!
>>>>> ES
>>>>>
>>>>> (t_initf) Read in prof_inparam namelist from: input
>>>>> PERF_SETOPTS: PAPI library not linked in. Request to enable PAPI ignored.
>>>>> (t_initf) Using profile_disable= F profile_timer= 2
>>>>> (t_initf) profile_depth_limit= 99999 profile_detail_limit= 1
>>>>> (t_initf) profile_barrier= F profile_outpe_num= 1
>>>>> (t_initf) profile_outpe_stride= 1 profile_single_file= F
>>>>> (t_initf) profile_global_stats= T profile_papi_enable= F
>>>>> call petsc_init
>>>>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>>> [0]PETSC ERROR: Invalid argument
>>>>> [0]PETSC ERROR: Unknown statement in options file: (�~A'^D)
>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>>>>> [0]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>>>>> [0]PETSC ERROR: /global/u2/e/eyoon/branch/dev_rhager_esyoon/epsi/XGCa/xgca on a sandybridge named nid05677 by eyoon Mon Apr 13 13:31:32 2015
>>>>> [0]PETSC ERROR: Configure options --known-mpi-int64_t=0 --known-bits-per-byte=8 --known-level1-dcache-assoc=0 --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --with-ar=ar --with-batch=1 --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-dependencies=0 --with-fc=ftn --with-fortran-datatypes=0 --with-fortran-interfaces=0 --with-fortranlib-autodetect=0 --with-ranlib=ranlib --with-scalar-type=real --with-shared-ld=ar --with-etags=0 --with-dependencies=0 --with-dependencies=0 --with-mpi-dir=/opt/cray/mpt/7.0.0/gni/mpich2-intel/140 --with-superlu=1 --with-superlu-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include --with-superlu-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libsuperlu.a --with-superlu_dist=1 --with-superlu_dist-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include --with-superlu_dist-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libsuperlu_dist.a --with-parmetis=1 --with-parmetis-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include --with-parmetis-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libparmetis.a --with-metis=1 --with-metis-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include --with-metis-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libmetis.a --with-ptscotch=1 --with-ptscotch-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include --with-ptscotch-lib="-L/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib -lptscotch -lscotch -lptscotcherr -lscotcherr" --with-scalapack=1 --with-scalapack-include=/opt/cray/libsci/13.0.0/INTEL/140/sandybridge/include --with-scalapack-lib="-L/opt/cray/libsci/13.0.0/INTEL/140/sandybridge/lib -lsci_intel_mpi_mp -lsci_intel_mp" --with-mumps=1 --with-mumps-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include --with-mumps-lib="-L/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib -lcmumps -ldmumps -lesmumps -lsmumps -lzmumps -lmumps_common -lptesmumps -lpord" --CFLAGS="-xavx -openmp -O3 " --CXXFLAGS="-xavx -openmp -O3 " --FFLAGS="-xavx -openmp -O3 " --LIBS=-lstdc++ --CXX_LINKER_FLAGS= --PETSC_ARCH=sandybridge --prefix=/opt/cray/petsc/3.5.2.1/real/INTEL/140/sandybridge --with-hypre=1 --with-hypre-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include --with-hypre-lib=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib/libHYPRE.a --with-sundials=1 --with-sundials-include=/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/include --with-sundials-lib="-L/opt/cray/tpsl/1.4.3/INTEL/140/sandybridge/lib -lsundials_cvode -lsundials_cvodes -lsundials_ida -lsundials_idas -lsundials_kinsol -lsundials_nvecparallel -lsundials_nvecserial"
>>>>> [0]PETSC ERROR: #1 PetscOptionsInsertFile() line 534 in /b/cray-petsc/.cray-build/INTEL/140/sandybridge/cray-petsc-base-dynamic/petsc-3.5.2/src/sys/objects/options.c
>>>>> [0]PETSC ERROR: #2 PetscOptionsInsert() line 716 in /b/cray-petsc/.cray-build/INTEL/140/sandybridge/cray-petsc-base-dynamic/petsc-3.5.2/src/sys/objects/options.c
>>>>> [0]PETSC ERROR: PetscInitialize:Creating options database
>>>>> PETSC ERROR: Logging has not been enabled.
>>>>> You might have forgotten to call PetscInitialize().
>>>>> Rank 0 [Mon Apr 13 13:31:32 2015] [c5-3c1s11n1] application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>>>> forrtl: error (76): Abort trap signal
>>>>> Image PC Routine Line Source
>>>>> xgca 0000000003363F21 Unknown Unknown Unknown
>>>>> xgca 0000000003362677 Unknown Unknown Unknown
>>>>> xgca 000000000331A2F4 Unknown Unknown Unknown
>>>>> xgca 000000000331A106 Unknown Unknown Unknown
>>>>> xgca 00000000032AE434 Unknown Unknown Unknown
>>>>> xgca 00000000032B53B1 Unknown Unknown Unknown
>>>>> xgca 0000000002F64B60 Unknown Unknown Unknown
>>>>> xgca 0000000002F64B1B Unknown Unknown Unknown
>>>>> xgca 0000000003371B11 Unknown Unknown Unknown
>>>>> xgca 0000000003131922 Unknown Unknown Unknown
>>>>> xgca 0000000003100063 Unknown Unknown Unknown
>>>>> xgca 00000000008BD7F0 Unknown Unknown Unknown
>>>>> xgca 00000000008B2241 Unknown Unknown Unknown
>>>>> xgca 00000000008C1B41 Unknown Unknown Unknown
>>>>> xgca 000000000042554B perf_monitor_mp_p 1875 module.F90
>>>>> xgca 000000000051F3BE MAIN__ 95 main.F90
>>>>> xgca 0000000000405DEE Unknown Unknown Unknown
>>>>> xgca 000000000336B6C1 Unknown Unknown Unknown
>>>>> xgca 0000000000405CD1 Unknown Unknown Unknown
>>>>> _pmiu_daemon(SIGCHLD): [NID 05677] [c5-3c1s11n1] [Mon Apr 13 13:31:32 2015] PE RANK 0 exit signal Aborted
>>>>> [NID 05677] 2015-04-13 13:31:32 Apid 11750871: initiated application termination
>>>>> Application 11750871 exit codes: 134
>>>>> Application 11750871 exit signals: Killed
>>>>> Application 11750871 resources: utime ~60s, stime ~12s, Rss ~29844, inblocks ~3174405, outblocks ~8270892
>>>>> From: Robert Hager [rhager at pppl.gov]
>>>>> Sent: Monday, April 13, 2015 2:16 PM
>>>>> To: Yoon, Eisung
>>>>> Cc: shephard at rpi.edu; cschang at pppl.gov
>>>>> Subject: Re: related to compiling your source code
>>>>>
>>>>> Hi Eisung,
>>>>>
>>>>> you can use the input in
>>>>>
>>>>> /project/projectdirs/m499/rhager/xgca_chang-hinton_test.tar
>>>>>
>>>>> Let me know if you have trouble reading the file.
>>>>>
>>>>> Best regards
>>>>>
>>>>> Robert
>>>>>
>>>>> On Apr 13, 2015, at 1:46 PM, Yoon, Eisung wrote:
>>>>>
>>>>>> Hi Robert,
>>>>>>
>>>>>> Thank you for the information and explanation. I attach a text file which contains issues of source code with TRIGRID and variable collision time.
>>>>>>
>>>>>> I'm sorry for not telling you previously that I was compiling the source code in PPPL server. I've not ready to use XGC in Edison yet but I'm going to work it to be ready right now.
>>>>>>
>>>>>> Could you send me an input file of XGCa for a collision test in Edison?
>>>>>>
>>>>>> Thanks a lot!!!
>>>>>> ES
>>>>>>
>>>>>>
>>>>>> From: Robert Hager [rhager at pppl.gov]
>>>>>> Sent: Monday, April 13, 2015 10:34 AM
>>>>>> To: Yoon, Eisung
>>>>>> Cc: shephard at rpi.edu; cschang at pppl.gov
>>>>>> Subject: Re: related to compiling your source code
>>>>>>
>>>>>> Hi Eisung,
>>>>>>
>>>>>> the TRIGRID directive should not cause any errors. Can I see the error message?
>>>>>>
>>>>>> I looked at Makefile.edison in your branch. It looks fine. You might have to change defs.mk though. There is one include statement to import some PETSc variable definitions. Depending on whether you use PETSc 3.5 or 3.6, you have to use the first or the second line, respectively.
>>>>>>
>>>>>> On Edison, I load the following modules in addition to the default:
>>>>>>
>>>>>> module load cray-petsc
>>>>>> module load cray-hdf5-parallel
>>>>>> module load pspline
>>>>>>
>>>>>> The output of module list is
>>>>>>
>>>>>> Currently Loaded Modulefiles:
>>>>>> 1) modules/3.2.10.2 13) gni-headers/3.0-1.0502.9684.5.2.ari 25) cray-petsc/3.5.2.1
>>>>>> 2) nsg/1.2.0 14) xpmem/0.1-2.0502.55507.3.2.ari 26) cray-hdf5-parallel/1.8.13
>>>>>> 3) eswrap/1.1.0-1.020200.1130.0 15) dvs/2.5_0.9.0-1.0502.1873.1.145.ari 27) pspline/nersc1.0
>>>>>> 4) switch/1.0-1.0502.54233.2.96.ari 16) alps/5.2.1-2.0502.9041.11.6.ari 28) allineatools/5.0.1
>>>>>> 5) craype-network-aries 17) rca/1.0.0-2.0502.53711.3.127.ari 29) idl/8.2
>>>>>> 6) craype/2.2.1 18) atp/1.7.5 30) gv/3.7.3
>>>>>> 7) intel/15.0.1.133 19) PrgEnv-intel/5.2.40 31) latex/2012
>>>>>> 8) cray-libsci/13.0.1 20) craype-ivybridge 32) altd/2.0
>>>>>> 9) udreg/2.3.2-1.0502.9275.1.12.ari 21) cray-shmem/7.1.1 33) darshan/2.3.0
>>>>>> 10) ugni/5.0-1.0502.9685.4.24.ari 22) cray-mpich/7.1.1 34) usg-default-modules/1.1
>>>>>> 11) pmi/5.0.6-1.0000.10439.140.2.ari 23) torque/5.0.1
>>>>>> 12) dmapp/7.0.1-1.0502.9501.5.219.ari 24) moab/8.0.1-2014110616-5c7a394-sles11
>>>>>>
>>>>>> Last time I tried, the code compiled with these settings. It also ran a couple of time steps. But there are still some bugs in the code. Making the collision time step variable is a bit complicated because the collision operation is usually run together with all other sources like heating, etc. Therefore, the distribution function is evaluated only every sml_f_source_period time steps. If a collision operation is supposed to run at a different time step, f will not be available with the current code. However, in order to test whether it is worth to pursue this approach, I wanted to implement variable collision time steps in the simplest possible way, i.e. sml_f_source_period=0 and all sources except the collision operation deactivated. The collision interval must have an upper limit which I set to 10 time steps in my test. The interval for load-balancing should be a multiple of this upper limit in order to be efficient. If this approach helps to improve performance, we can think about how to implement variable collision intervals in a cleaner way.
>>>>>>
>>>>>> Let me know if you have any further problems.
>>>>>>
>>>>>> Best
>>>>>>
>>>>>> Robert
>>>>>>
>>>>>>
>>>>>> On Apr 12, 2015, at 2:38 PM, Yoon, Eisung wrote:
>>>>>>
>>>>>>> Hi Robert,
>>>>>>>
>>>>>>> Thank you for the performance test data. I really appreciate your work.
>>>>>>>
>>>>>>> As for variable collision time, I've made a branch "dev_rhager_esyoon" as a copy of your source code, "dev_rhager". I've read your modification for variable collision time in the XGCa folder.
>>>>>>>
>>>>>>> In order to run the code, I currently have trouble with compiling the source code. It appears preprocessing directives -DTRIGRID causes the error. Could you send me your Makefile to see working compile options?
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Best,
>>>>>>> ES
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> <code_reading.txt>
>>
>> <defs.mk><Makefile><rules.mk>
>
>
>
More information about the petsc-dev
mailing list