[petsc-users] killed 9 signal after upgrade from petsc 3.9.4 to 3.12.2

Stefano Zampini stefano.zampini at gmail.com
Thu Jan 9 09:25:35 CST 2020


Can you reproduce the issue with smaller matrices? Or with a debug build (i.e. using —with-debugging=1 and compilation flags -02 -g)? 

The only changes in parmetis between the two PETSc releases are these below, but I don’t see how they could cause issues

kl-18448:pkg-parmetis szampini$ git log -2
commit ab4fedc6db1f2e3b506be136e3710fcf89ce16ea (HEAD -> master, tag: v4.0.3-p5, origin/master, origin/dalcinl/random, origin/HEAD)
Author: Lisandro Dalcin <dalcinl at gmail.com>
Date:   Thu May 9 18:44:10 2019 +0300

    GKLib: Make FPRFX##randInRange() portable for 32bit/64bit indices

commit 2b4afc79a79ef063f369c43da2617fdb64746dd7
Author: Lisandro Dalcin <dalcinl at gmail.com>
Date:   Sat May 4 17:22:19 2019 +0300

    GKlib: Use gk_randint32() to define the RandomInRange() macro



> On Jan 9, 2020, at 4:31 AM, Smith, Barry F. via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> 
>  This is extremely worrisome:
> 
> ==23361== Use of uninitialised value of size 8
> ==23361==    at 0x847E939: gk_randint64 (random.c:99)
> ==23361==    by 0x847EF88: gk_randint32 (random.c:128)
> ==23361==    by 0x81EBF0B: libparmetis__Match_Global (in /space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib/libparmetis.so)
> 
> do you get that with PETSc-3.9.4 or only with 3.12.3?  
> 
>   This may result in Parmetis using non-random numbers and then giving back an inappropriate ordering that requires more memory for SuperLU_DIST.
> 
>  Suggest looking at the code, or running in the debugger to see what is going on there. We use parmetis all the time and don't see this.
> 
>  Barry
> 
> 
> 
> 
> 
> 
>> On Jan 8, 2020, at 4:34 PM, Santiago Andres Triana <repepo at gmail.com> wrote:
>> 
>> Dear Matt, petsc-users:
>> 
>> Finally back after the holidays to try to solve this issue, thanks for your patience!
>> I compiled the latest petsc (3.12.3) with debugging enabled, the same problem appears: relatively large matrices result in out of memory errors. This is not the case for petsc-3.9.4, all fine there.
>> This is a non-hermitian, generalized eigenvalue problem, I generate the A and B matrices myself and then I use example 7 (from the slepc tutorial at $SLEPC_DIR/src/eps/examples/tutorials/ex7.c ) to solve the problem:
>> 
>> mpiexec -n 24 valgrind --tool=memcheck -q --num-callers=20 --log-file=valgrind.log.%p ./ex7 -malloc off -f1 A.petsc -f2 B.petsc -eps_nev 1 -eps_target -2.5e-4+1.56524i -eps_target_magnitude -eps_tol 1e-14 $opts
>> 
>> where the $opts variable is:
>> export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu -eps_error_relative ::ascii_info_detail -st_pc_factor_mat_solver_type superlu_dist -mat_superlu_dist_iterrefine 1 -mat_superlu_dist_colperm PARMETIS -mat_superlu_dist_parsymbfact 1 -eps_converged_reason -eps_conv_rel -eps_monitor_conv -eps_true_residual 1'
>> 
>> the output from valgrind (sample from one processor) and from the program are attached.
>> If it's of any use the matrices are here (might need at least 180 Gb of ram to solve the problem succesfully under petsc-3.9.4):
>> 
>> https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0
>> https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0
>> 
>> WIth petsc-3.9.4 and slepc-3.9.2 I can use matrices up to 10Gb (with 240 Gb ram), but only up to 3Gb with the latest petsc/slepc.
>> Any suggestions, comments or any other help are very much appreciated!
>> 
>> Cheers,
>> Santiago
>> 
>> 
>> 
>> On Mon, Dec 23, 2019 at 11:19 PM Matthew Knepley <knepley at gmail.com> wrote:
>> On Mon, Dec 23, 2019 at 3:14 PM Santiago Andres Triana <repepo at gmail.com> wrote:
>> Dear all,
>> 
>> After upgrading to petsc 3.12.2 my solver program crashes consistently. Before the upgrade I was using petsc 3.9.4 with no problems.
>> 
>> My application deals with a complex-valued, generalized eigenvalue problem. The matrices involved are relatively large, typically 2 to 10 Gb in size, which is no problem for petsc 3.9.4.
>> 
>> Are you sure that your indices do not exceed 4B? If so, you need to configure using
>> 
>>  --with-64-bit-indices
>> 
>> Also, it would be nice if you ran with the debugger so we can get a stack trace for the SEGV.
>> 
>>  Thanks,
>> 
>>    Matt
>> 
>> However, after the upgrade I can only obtain solutions when the matrices are small, the solver crashes when the matrices' size exceed about 1.5 Gb:
>> 
>> [0]PETSC ERROR: ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
>> [0]PETSC ERROR: to get more information on the crash.
>> 
>> and so on for each cpu.
>> 
>> 
>> I tried using valgrind and this is the typical output:
>> 
>> ==2874== Conditional jump or move depends on uninitialised value(s)
>> ==2874==    at 0x4018178: index (in /lib64/ld-2.22.so)
>> ==2874==    by 0x400752D: expand_dynamic_string_token (in /lib64/ld-2.22.so)
>> ==2874==    by 0x4008009: _dl_map_object (in /lib64/ld-2.22.so)
>> ==2874==    by 0x40013E4: map_doit (in /lib64/ld-2.22.so)
>> ==2874==    by 0x400EA53: _dl_catch_error (in /lib64/ld-2.22.so)
>> ==2874==    by 0x4000ABE: do_preload (in /lib64/ld-2.22.so)
>> ==2874==    by 0x4000EC0: handle_ld_preload (in /lib64/ld-2.22.so)
>> ==2874==    by 0x40034F0: dl_main (in /lib64/ld-2.22.so)
>> ==2874==    by 0x4016274: _dl_sysdep_start (in /lib64/ld-2.22.so)
>> ==2874==    by 0x4004A99: _dl_start (in /lib64/ld-2.22.so)
>> ==2874==    by 0x40011F7: ??? (in /lib64/ld-2.22.so)
>> ==2874==    by 0x12: ???
>> ==2874== 
>> 
>> 
>> These are my configuration options. Identical for both petsc 3.9.4 and 3.12.2:
>> 
>> ./configure --with-scalar-type=complex --download-mumps --download-parmetis --download-metis --download-scalapack=1 --download-fblaslapack=1 --with-debugging=0 --download-superlu_dist=1 --download-ptscotch=1 CXXOPTFLAGS='-O3 -march=native' FOPTFLAGS='-O3 -march=native' COPTFLAGS='-O3 -march=native'
>> 
>> 
>> Thanks in advance for any comments or ideas!
>> 
>> Cheers,
>> Santiago
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/
>> <test1.e6034496><valgrind.log.23361>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200109/bfd4ce3e/attachment.html>


More information about the petsc-users mailing list