[petsc-users] killed 9 signal after upgrade from petsc 3.9.4 to 3.12.2

Dave May dave.mayhem23 at gmail.com
Sat Jan 11 01:34:04 CST 2020


On Sat 11. Jan 2020 at 00:04, Santiago Andres Triana <repepo at gmail.com>
wrote:

> Hi Barry, petsc-users:
>
> Just updated to petsc-3.12.3 and the performance is about the same as
> 3.12.2, i.e. about 2x the memory use of petsc-3.9.4
>
>
> petsc-3.12.3 (uses superlu_dist-6.2.0)
>
> Summary of Memory Usage in PETSc
> Maximum (over computational time) process memory:        total 2.9368e+10
> max 1.2922e+09 min 1.1784e+09
> Current process memory:                                  total 2.8192e+10
> max 1.2263e+09 min 1.1456e+09
> Maximum (over computational time) space PetscMalloc()ed: total 2.7619e+09
> max 1.4339e+08 min 8.6494e+07
> Current space PetscMalloc()ed:                           total 3.6127e+06
> max 1.5053e+05 min 1.5053e+05
>
>
> petsc-3.9.4
>
> Summary of Memory Usage in PETSc
> Maximum (over computational time) process memory:        total 1.5695e+10
> max 7.1985e+08 min 6.0131e+08
> Current process memory:                                  total 1.3186e+10
> max 6.9240e+08 min 4.2821e+08
> Maximum (over computational time) space PetscMalloc()ed: total 3.1290e+09
> max 1.5869e+08 min 1.0179e+08
> Current space PetscMalloc()ed:                           total 1.8808e+06
> max 7.8368e+04 min 7.8368e+04
>
>
> However, it seems that the culprit is superlu_dist: I recompiled current
> petsc/slepc with superlu_dist-5.4.0 (used option
> --download-superlu_dist=/home/spin/superlu_dist-5.4.0.tar.gz) and the
> result is this:
>
> petsc-3.12.3 with superlu_dist-5.4.0:
>
> Summary of Memory Usage in PETSc
> Maximum (over computational time) process memory:        total 1.5636e+10
> max 7.1217e+08 min 5.9963e+08
> Current process memory:                                  total 1.3401e+10
> max 6.5498e+08 min 4.2626e+08
> Maximum (over computational time) space PetscMalloc()ed: total 2.7619e+09
> max 1.4339e+08 min 8.6494e+07
> Current space PetscMalloc()ed:                           total 3.6127e+06
> max 1.5053e+05 min 1.5053e+05
>
> I could not compile petsc-3.12.3 with the exact superlu_dist version that
> petsc-3.9.4 uses (5.3.0), but will try newer versions to see how they
> perform ... I guess I should address this issue to the superlu mantainers?
>

Yes.



> Thanks!
> Santiago
>
> On Fri, Jan 10, 2020 at 9:19 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
>
>>
>>   Can you please try v3.12.3  There was some funky business mistakenly
>> added related to partitioning that has been fixed in 3.12.3
>>
>>    Barry
>>
>>
>> > On Jan 10, 2020, at 1:57 PM, Santiago Andres Triana <repepo at gmail.com>
>> wrote:
>> >
>> > Dear all,
>> >
>> > I ran the program with valgrind --tool=massif, the results are cryptic
>> to me ... not sure who's the memory hog! the logs are attached.
>> >
>> > The command I used is:
>> > mpiexec -n 24 valgrind --tool=massif --num-callers=20
>> --log-file=valgrind.log.%p ./ex7 -f1 A.petsc -f2 B.petsc -eps_nev 1 $opts
>> -eps_target -4.008e-3+1.57142i -eps_target_magnitude -eps_tol 1e-14
>> >
>> > Is there any possibility to install a version of superlu_dist (or
>> mumps) different from what the petsc version automatically downloads?
>> >
>> > Thanks!
>> > Santiago
>> >
>> >
>> > On Thu, Jan 9, 2020 at 10:04 PM Dave May <dave.mayhem23 at gmail.com>
>> wrote:
>> > This kind of issue is difficult to untangle because you have
>> potentially three pieces of software which might have changed between v3.9
>> and v3.12, namely
>> > PETSc, SLEPC and SuperLU_dist.
>> > You need to isolate which software component is responsible for the 2x
>> increase in memory.
>> >
>> > When I look at the memory usage in the log files, things look very very
>> similar for the raw PETSc objects.
>> >
>> > [v3.9]
>> > --- Event Stage 0: Main Stage
>> >
>> >               Viewer     4              3         2520     0.
>> >               Matrix    15             15    125236536     0.
>> >               Vector    22             22     19713856     0.
>> >            Index Set    10             10       995280     0.
>> >          Vec Scatter     4              4         4928     0.
>> >           EPS Solver     1              1         2276     0.
>> >   Spectral Transform     1              1          848     0.
>> >        Basis Vectors     1              1         2168     0.
>> >          PetscRandom     1              1          662     0.
>> >               Region     1              1          672     0.
>> >        Direct Solver     1              1        17440     0.
>> >        Krylov Solver     1              1         1176     0.
>> >       Preconditioner     1              1         1000     0.
>> >
>> > versus
>> >
>> > [v3.12]
>> > --- Event Stage 0: Main Stage
>> >
>> >               Viewer     4              3         2520     0.
>> >               Matrix    15             15    125237144     0.
>> >               Vector    22             22     19714528     0.
>> >            Index Set    10             10       995096     0.
>> >          Vec Scatter     4              4         3168     0.
>> >    Star Forest Graph     4              4         3936     0.
>> >           EPS Solver     1              1         2292     0.
>> >   Spectral Transform     1              1          848     0.
>> >        Basis Vectors     1              1         2184     0.
>> >          PetscRandom     1              1          662     0.
>> >               Region     1              1          672     0.
>> >        Direct Solver     1              1        17456     0.
>> >        Krylov Solver     1              1         1400     0.
>> >       Preconditioner     1              1         1000     0.
>> >
>> > Certainly there is no apparent factor 2x increase in memory usage in
>> the underlying petsc objects themselves.
>> > Furthermore, the counts of creations of petsc objects in toobig.log and
>> justfine.log match, indicating that none of the implementations used in
>> either PETSc or SLEPc have fundamentally changed wrt the usage of the
>> native petsc objects.
>> >
>> > It is also curious that VecNorm is called 3 times in "justfine.log" and
>> 19 times in "toobig.log" - although I don't see how that could be related
>> to you problem...
>> >
>> > The above at least gives me the impression that issue of memory
>> increase is likely not coming from PETSc.
>> > I just read Barry's useful email which is even more compelling and also
>> indicates SLEPc is not the likely culprit either as it uses PetscMalloc()
>> internally.
>> >
>> > Some options to identify the problem:
>> >
>> > 1/ Eliminate SLEPc as a possible culprit by not calling EPSSolve() and
>> rather just call KSPSolve() with some RHS vector.
>> > * If you still see a 2x increase, switch the preconditioner to using
>> -pc_type bjacobi -ksp_max_it 10 rather than superlu_dist.
>> > If the memory usage is good, you can be pretty certain the issue arises
>> internally to superl_dist.
>> >
>> > 2/ Leave your code as is and perform your profiling using mumps rather
>> than superlu_dist.
>> > This is a less reliable test than 1/ since the mumps implementation
>> used with v3.9 and v3.12 may differ...
>> >
>> > Thanks
>> > Dave
>> >
>> > On Thu, 9 Jan 2020 at 20:17, Santiago Andres Triana <repepo at gmail.com>
>> wrote:
>> > Dear all,
>> >
>> > I think parmetis is not involved since I still run out of memory if I
>> use the following options:
>> > export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu
>> -st_pc_factor_mat_solver_type superlu_dist -eps_true_residual 1'
>> > and  issuing:
>> > mpiexec -n 24 ./ex7 -f1 A.petsc -f2 B.petsc -eps_nev 1 -eps_target
>> -4.008e-3+1.57142i $opts -eps_target_magnitude -eps_tol 1e-14 -memory_view
>> >
>> > Bottom line is that the memory usage of petsc-3.9.4 / slepc-3.9.2 is
>> much lower than current version. I can only solve relatively small problems
>> using the 3.12 series :(
>> > I have an example with smaller matrices that will likely fail in a 32
>> Gb ram machine with petsc-3.12 but runs just fine with petsc-3.9. The
>> -memory_view output is
>> >
>> > with petsc-3.9.4: (log 'justfine.log' attached)
>> >
>> > Summary of Memory Usage in PETSc
>> > Maximum (over computational time) process memory:        total
>> 1.6665e+10 max 7.5674e+08 min 6.4215e+08
>> > Current process memory:                                  total
>> 1.5841e+10 max 7.2881e+08 min 6.0905e+08
>> > Maximum (over computational time) space PetscMalloc()ed: total
>> 3.1290e+09 max 1.5868e+08 min 1.0179e+08
>> > Current space PetscMalloc()ed:                           total
>> 1.8808e+06 max 7.8368e+04 min 7.8368e+04
>> >
>> >
>> > with petsc-3.12.2: (log 'toobig.log' attached)
>> >
>> > Summary of Memory Usage in PETSc
>> > Maximum (over computational time) process memory:        total
>> 3.1564e+10 max 1.3662e+09 min 1.2604e+09
>> > Current process memory:                                  total
>> 3.0355e+10 max 1.3082e+09 min 1.2254e+09
>> > Maximum (over computational time) space PetscMalloc()ed: total
>> 2.7618e+09 max 1.4339e+08 min 8.6493e+07
>> > Current space PetscMalloc()ed:                           total
>> 3.6127e+06 max 1.5053e+05 min 1.5053e+05
>> >
>> > Strangely, monitoring with 'top' I can see *appreciably higher* peak
>> memory use, usually twice what -memory_view ends up reporting, both for
>> petsc-3.9.4 and current. Program fails usually at this peak if not enough
>> ram available
>> >
>> > The matrices for the example quoted above can be downloaded here (I use
>> slepc's tutorial ex7.c to solve the problem):
>> > https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0  (about 600 Mb)
>> > https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0  (about 210 Mb)
>> >
>> > I haven't been able to use a debugger successfully since I am using a
>> compute node without the possibility of an xterm ... note that I have no
>> experience using a debugger so any help on that will also be appreciated!
>> > Hope I can switch to the current petsc/slepc version for my production
>> runs soon...
>> >
>> > Thanks again!
>> > Santiago
>> >
>> >
>> >
>> > On Thu, Jan 9, 2020 at 4:25 PM Stefano Zampini <
>> stefano.zampini at gmail.com> wrote:
>> > Can you reproduce the issue with smaller matrices? Or with a debug
>> build (i.e. using —with-debugging=1 and compilation flags -02 -g)?
>> >
>> > The only changes in parmetis between the two PETSc releases are these
>> below, but I don’t see how they could cause issues
>> >
>> > kl-18448:pkg-parmetis szampini$ git log -2
>> > commit ab4fedc6db1f2e3b506be136e3710fcf89ce16ea (HEAD -> master, tag:
>> v4.0.3-p5, origin/master, origin/dalcinl/random, origin/HEAD)
>> > Author: Lisandro Dalcin <dalcinl at gmail.com>
>> > Date:   Thu May 9 18:44:10 2019 +0300
>> >
>> >     GKLib: Make FPRFX##randInRange() portable for 32bit/64bit indices
>> >
>> > commit 2b4afc79a79ef063f369c43da2617fdb64746dd7
>> > Author: Lisandro Dalcin <dalcinl at gmail.com>
>> > Date:   Sat May 4 17:22:19 2019 +0300
>> >
>> >     GKlib: Use gk_randint32() to define the RandomInRange() macro
>> >
>> >
>> >
>> >> On Jan 9, 2020, at 4:31 AM, Smith, Barry F. via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >>
>> >>
>> >>  This is extremely worrisome:
>> >>
>> >> ==23361== Use of uninitialised value of size 8
>> >> ==23361==    at 0x847E939: gk_randint64 (random.c:99)
>> >> ==23361==    by 0x847EF88: gk_randint32 (random.c:128)
>> >> ==23361==    by 0x81EBF0B: libparmetis__Match_Global (in
>> /space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib/libparmetis.so)
>> >>
>> >> do you get that with PETSc-3.9.4 or only with 3.12.3?
>> >>
>> >>   This may result in Parmetis using non-random numbers and then giving
>> back an inappropriate ordering that requires more memory for SuperLU_DIST.
>> >>
>> >>  Suggest looking at the code, or running in the debugger to see what
>> is going on there. We use parmetis all the time and don't see this.
>> >>
>> >>  Barry
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>> On Jan 8, 2020, at 4:34 PM, Santiago Andres Triana <repepo at gmail.com>
>> wrote:
>> >>>
>> >>> Dear Matt, petsc-users:
>> >>>
>> >>> Finally back after the holidays to try to solve this issue, thanks
>> for your patience!
>> >>> I compiled the latest petsc (3.12.3) with debugging enabled, the same
>> problem appears: relatively large matrices result in out of memory errors.
>> This is not the case for petsc-3.9.4, all fine there.
>> >>> This is a non-hermitian, generalized eigenvalue problem, I generate
>> the A and B matrices myself and then I use example 7 (from the slepc
>> tutorial at $SLEPC_DIR/src/eps/examples/tutorials/ex7.c ) to solve the
>> problem:
>> >>>
>> >>> mpiexec -n 24 valgrind --tool=memcheck -q --num-callers=20
>> --log-file=valgrind.log.%p ./ex7 -malloc off -f1 A.petsc -f2 B.petsc
>> -eps_nev 1 -eps_target -2.5e-4+1.56524i -eps_target_magnitude -eps_tol
>> 1e-14 $opts
>> >>>
>> >>> where the $opts variable is:
>> >>> export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu
>> -eps_error_relative ::ascii_info_detail -st_pc_factor_mat_solver_type
>> superlu_dist -mat_superlu_dist_iterrefine 1 -mat_superlu_dist_colperm
>> PARMETIS -mat_superlu_dist_parsymbfact 1 -eps_converged_reason
>> -eps_conv_rel -eps_monitor_conv -eps_true_residual 1'
>> >>>
>> >>> the output from valgrind (sample from one processor) and from the
>> program are attached.
>> >>> If it's of any use the matrices are here (might need at least 180 Gb
>> of ram to solve the problem succesfully under petsc-3.9.4):
>> >>>
>> >>> https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0
>> >>> https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0
>> >>>
>> >>> WIth petsc-3.9.4 and slepc-3.9.2 I can use matrices up to 10Gb (with
>> 240 Gb ram), but only up to 3Gb with the latest petsc/slepc.
>> >>> Any suggestions, comments or any other help are very much appreciated!
>> >>>
>> >>> Cheers,
>> >>> Santiago
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Dec 23, 2019 at 11:19 PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>> >>> On Mon, Dec 23, 2019 at 3:14 PM Santiago Andres Triana <
>> repepo at gmail.com> wrote:
>> >>> Dear all,
>> >>>
>> >>> After upgrading to petsc 3.12.2 my solver program crashes
>> consistently. Before the upgrade I was using petsc 3.9.4 with no problems.
>> >>>
>> >>> My application deals with a complex-valued, generalized eigenvalue
>> problem. The matrices involved are relatively large, typically 2 to 10 Gb
>> in size, which is no problem for petsc 3.9.4.
>> >>>
>> >>> Are you sure that your indices do not exceed 4B? If so, you need to
>> configure using
>> >>>
>> >>>  --with-64-bit-indices
>> >>>
>> >>> Also, it would be nice if you ran with the debugger so we can get a
>> stack trace for the SEGV.
>> >>>
>> >>>  Thanks,
>> >>>
>> >>>    Matt
>> >>>
>> >>> However, after the upgrade I can only obtain solutions when the
>> matrices are small, the solver crashes when the matrices' size exceed about
>> 1.5 Gb:
>> >>>
>> >>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> >>> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or
>> the batch system) has told this process to end
>> >>> [0]PETSC ERROR: Try option -start_in_debugger or
>> -on_error_attach_debugger
>> >>> [0]PETSC ERROR: or see
>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> >>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
>> Mac OS X to find memory corruption errors
>> >>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile,
>> link, and run
>> >>> [0]PETSC ERROR: to get more information on the crash.
>> >>>
>> >>> and so on for each cpu.
>> >>>
>> >>>
>> >>> I tried using valgrind and this is the typical output:
>> >>>
>> >>> ==2874== Conditional jump or move depends on uninitialised value(s)
>> >>> ==2874==    at 0x4018178: index (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x400752D: expand_dynamic_string_token (in /lib64/
>> ld-2.22.so)
>> >>> ==2874==    by 0x4008009: _dl_map_object (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x40013E4: map_doit (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x400EA53: _dl_catch_error (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x4000ABE: do_preload (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x4000EC0: handle_ld_preload (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x40034F0: dl_main (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x4016274: _dl_sysdep_start (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x4004A99: _dl_start (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x40011F7: ??? (in /lib64/ld-2.22.so)
>> >>> ==2874==    by 0x12: ???
>> >>> ==2874==
>> >>>
>> >>>
>> >>> These are my configuration options. Identical for both petsc 3.9.4
>> and 3.12.2:
>> >>>
>> >>> ./configure --with-scalar-type=complex --download-mumps
>> --download-parmetis --download-metis --download-scalapack=1
>> --download-fblaslapack=1 --with-debugging=0 --download-superlu_dist=1
>> --download-ptscotch=1 CXXOPTFLAGS='-O3 -march=native' FOPTFLAGS='-O3
>> -march=native' COPTFLAGS='-O3 -march=native'
>> >>>
>> >>>
>> >>> Thanks in advance for any comments or ideas!
>> >>>
>> >>> Cheers,
>> >>> Santiago
>> >>>
>> >>>
>> >>> --
>> >>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> >>> -- Norbert Wiener
>> >>>
>> >>> https://www.cse.buffalo.edu/~knepley/
>> >>> <test1.e6034496><valgrind.log.23361>
>> >>
>> >
>> > <massif.out.petsc-3.9><massif.out.petsc-3.12>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200111/e596ce49/attachment-0001.html>


More information about the petsc-users mailing list