<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<br class="">
<br class="">
with petsc-3.9.4: (log 'justfine.log' attached)<br class="">
<br class="">
Summary of Memory Usage in PETSc<br class="">
Maximum (over computational time) process memory: total 1.6665e+10 max 7.5674e+08 min 6.4215e+08<br class="">
Current process memory: total 1.5841e+10 max 7.2881e+08 min 6.0905e+08
<div class=""><br class="">
</div>
<div class=""><font color="#ff2600" class="">Below is the space allocated by PETSc </font></div>
<div class=""><br class="">
Maximum (over computational time) space PetscMalloc()ed: total 3.1290e+09 max 1.5868e+08 min 1.0179e+08<br class="">
Current space PetscMalloc()ed: total 1.8808e+06 max 7.8368e+04 min 7.8368e+04<br class="">
<br class="">
<br class="">
with petsc-3.12.2: (log 'toobig.log' attached)<br class="">
<br class="">
Summary of Memory Usage in PETSc<br class="">
Maximum (over computational time) process memory: total 3.1564e+10 max 1.3662e+09 min 1.2604e+09<br class="">
Current process memory: total 3.0355e+10 max 1.3082e+09 min 1.2254e+09</div>
<div class=""><br class="">
</div>
<div class=""><span style="caret-color: rgb(255, 38, 0);" class=""><font color="#ff2600" class="">Below is the space allocated by PETSc.
</font> <font color="#ff2600" class="">Note that </font></span><font color="#ff2600" class="">2.7618e+09 max is actually a bit lower than 3.1290e+09 </font></div>
<div class=""><br class="">
Maximum (over computational time) space PetscMalloc()ed: total 2.7618e+09 max 1.4339e+08 min 8.6493e+07<br class="">
Current space PetscMalloc()ed: total 3.6127e+06 max 1.5053e+05 min 1.5053e+05
<div class=""><br class="">
</div>
<div class="">So it is not PETSc that allocating more memory than before. </div>
<div class=""><br class="">
</div>
<div class="">Use the Massif option of valgrind to see where the large chunk of memory is actually used in the simulation.</div>
<div class=""><br class="">
</div>
<div class=""> Barry</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<blockquote type="cite" class="">On Jan 9, 2020, at 2:16 PM, Santiago Andres Triana <<a href="mailto:repepo@gmail.com" class="">repepo@gmail.com</a>> wrote:<br class="">
<br class="">
Dear all,<br class="">
<br class="">
I think parmetis is not involved since I still run out of memory if I use the following options:<br class="">
export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_type superlu_dist -eps_true_residual 1'<br class="">
and issuing:<br class="">
mpiexec -n 24 ./ex7 -f1 A.petsc -f2 B.petsc -eps_nev 1 -eps_target -4.008e-3+1.57142i $opts -eps_target_magnitude -eps_tol 1e-14 -memory_view<br class="">
<br class="">
Bottom line is that the memory usage of petsc-3.9.4 / slepc-3.9.2 is much lower than current version. I can only solve relatively small problems using the 3.12 series :(<br class="">
I have an example with smaller matrices that will likely fail in a 32 Gb ram machine with petsc-3.12 but runs just fine with petsc-3.9. The -memory_view output is<br class="">
<br class="">
with petsc-3.9.4: (log 'justfine.log' attached)<br class="">
<br class="">
Summary of Memory Usage in PETSc<br class="">
Maximum (over computational time) process memory: total 1.6665e+10 max 7.5674e+08 min 6.4215e+08<br class="">
Current process memory: total 1.5841e+10 max 7.2881e+08 min 6.0905e+08<br class="">
Maximum (over computational time) space PetscMalloc()ed: total 3.1290e+09 max 1.5868e+08 min 1.0179e+08<br class="">
Current space PetscMalloc()ed: total 1.8808e+06 max 7.8368e+04 min 7.8368e+04<br class="">
<br class="">
<br class="">
with petsc-3.12.2: (log 'toobig.log' attached)<br class="">
<br class="">
Summary of Memory Usage in PETSc<br class="">
Maximum (over computational time) process memory: total 3.1564e+10 max 1.3662e+09 min 1.2604e+09<br class="">
Current process memory: total 3.0355e+10 max 1.3082e+09 min 1.2254e+09<br class="">
Maximum (over computational time) space PetscMalloc()ed: total 2.7618e+09 max 1.4339e+08 min 8.6493e+07<br class="">
Current space PetscMalloc()ed: total 3.6127e+06 max 1.5053e+05 min 1.5053e+05<br class="">
<br class="">
Strangely, monitoring with 'top' I can see *appreciably higher* peak memory use, usually twice what -memory_view ends up reporting, both for petsc-3.9.4 and current. Program fails usually at this peak if not enough ram available<br class="">
<br class="">
The matrices for the example quoted above can be downloaded here (I use slepc's tutorial ex7.c to solve the problem):<br class="">
<a href="https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0" class="">https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0</a> (about 600 Mb)<br class="">
https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0 (about 210 Mb)<br class="">
<br class="">
I haven't been able to use a debugger successfully since I am using a compute node without the possibility of an xterm ... note that I have no experience using a debugger so any help on that will also be appreciated!<br class="">
Hope I can switch to the current petsc/slepc version for my production runs soon...<br class="">
<br class="">
Thanks again!<br class="">
Santiago<br class="">
<br class="">
<br class="">
<br class="">
On Thu, Jan 9, 2020 at 4:25 PM Stefano Zampini <stefano.zampini@gmail.com> wrote:<br class="">
Can you reproduce the issue with smaller matrices? Or with a debug build (i.e. using —with-debugging=1 and compilation flags -02 -g)? <br class="">
<br class="">
The only changes in parmetis between the two PETSc releases are these below, but I don’t see how they could cause issues<br class="">
<br class="">
kl-18448:pkg-parmetis szampini$ git log -2<br class="">
commit ab4fedc6db1f2e3b506be136e3710fcf89ce16ea (HEAD -> master, tag: v4.0.3-p5, origin/master, origin/dalcinl/random, origin/HEAD)<br class="">
Author: Lisandro Dalcin <dalcinl@gmail.com><br class="">
Date: Thu May 9 18:44:10 2019 +0300<br class="">
<br class="">
GKLib: Make FPRFX##randInRange() portable for 32bit/64bit indices<br class="">
<br class="">
commit 2b4afc79a79ef063f369c43da2617fdb64746dd7<br class="">
Author: Lisandro Dalcin <dalcinl@gmail.com><br class="">
Date: Sat May 4 17:22:19 2019 +0300<br class="">
<br class="">
GKlib: Use gk_randint32() to define the RandomInRange() macro<br class="">
<br class="">
<br class="">
<br class="">
<blockquote type="cite" class="">On Jan 9, 2020, at 4:31 AM, Smith, Barry F. via petsc-users <petsc-users@mcs.anl.gov> wrote:<br class="">
<br class="">
<br class="">
This is extremely worrisome:<br class="">
<br class="">
==23361== Use of uninitialised value of size 8<br class="">
==23361== at 0x847E939: gk_randint64 (random.c:99)<br class="">
==23361== by 0x847EF88: gk_randint32 (random.c:128)<br class="">
==23361== by 0x81EBF0B: libparmetis__Match_Global (in /space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib/libparmetis.so)<br class="">
<br class="">
do you get that with PETSc-3.9.4 or only with 3.12.3? <br class="">
<br class="">
This may result in Parmetis using non-random numbers and then giving back an inappropriate ordering that requires more memory for SuperLU_DIST.<br class="">
<br class="">
Suggest looking at the code, or running in the debugger to see what is going on there. We use parmetis all the time and don't see this.<br class="">
<br class="">
Barry<br class="">
<br class="">
<br class="">
<br class="">
<br class="">
<br class="">
<br class="">
<blockquote type="cite" class="">On Jan 8, 2020, at 4:34 PM, Santiago Andres Triana <repepo@gmail.com> wrote:<br class="">
<br class="">
Dear Matt, petsc-users:<br class="">
<br class="">
Finally back after the holidays to try to solve this issue, thanks for your patience!<br class="">
I compiled the latest petsc (3.12.3) with debugging enabled, the same problem appears: relatively large matrices result in out of memory errors. This is not the case for petsc-3.9.4, all fine there.<br class="">
This is a non-hermitian, generalized eigenvalue problem, I generate the A and B matrices myself and then I use example 7 (from the slepc tutorial at $SLEPC_DIR/src/eps/examples/tutorials/ex7.c ) to solve the problem:<br class="">
<br class="">
mpiexec -n 24 valgrind --tool=memcheck -q --num-callers=20 --log-file=valgrind.log.%p ./ex7 -malloc off -f1 A.petsc -f2 B.petsc -eps_nev 1 -eps_target -2.5e-4+1.56524i -eps_target_magnitude -eps_tol 1e-14 $opts<br class="">
<br class="">
where the $opts variable is:<br class="">
export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu -eps_error_relative ::ascii_info_detail -st_pc_factor_mat_solver_type superlu_dist -mat_superlu_dist_iterrefine 1 -mat_superlu_dist_colperm PARMETIS -mat_superlu_dist_parsymbfact 1 -eps_converged_reason
-eps_conv_rel -eps_monitor_conv -eps_true_residual 1'<br class="">
<br class="">
the output from valgrind (sample from one processor) and from the program are attached.<br class="">
If it's of any use the matrices are here (might need at least 180 Gb of ram to solve the problem succesfully under petsc-3.9.4):<br class="">
<br class="">
https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0<br class="">
https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0<br class="">
<br class="">
WIth petsc-3.9.4 and slepc-3.9.2 I can use matrices up to 10Gb (with 240 Gb ram), but only up to 3Gb with the latest petsc/slepc.<br class="">
Any suggestions, comments or any other help are very much appreciated!<br class="">
<br class="">
Cheers,<br class="">
Santiago<br class="">
<br class="">
<br class="">
<br class="">
On Mon, Dec 23, 2019 at 11:19 PM Matthew Knepley <knepley@gmail.com> wrote:<br class="">
On Mon, Dec 23, 2019 at 3:14 PM Santiago Andres Triana <repepo@gmail.com> wrote:<br class="">
Dear all,<br class="">
<br class="">
After upgrading to petsc 3.12.2 my solver program crashes consistently. Before the upgrade I was using petsc 3.9.4 with no problems.<br class="">
<br class="">
My application deals with a complex-valued, generalized eigenvalue problem. The matrices involved are relatively large, typically 2 to 10 Gb in size, which is no problem for petsc 3.9.4.<br class="">
<br class="">
Are you sure that your indices do not exceed 4B? If so, you need to configure using<br class="">
<br class="">
--with-64-bit-indices<br class="">
<br class="">
Also, it would be nice if you ran with the debugger so we can get a stack trace for the SEGV.<br class="">
<br class="">
Thanks,<br class="">
<br class="">
Matt<br class="">
<br class="">
However, after the upgrade I can only obtain solutions when the matrices are small, the solver crashes when the matrices' size exceed about 1.5 Gb:<br class="">
<br class="">
[0]PETSC ERROR: ------------------------------------------------------------------------<br class="">
[0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end<br class="">
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger<br class="">
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind<br class="">
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors<br class="">
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run <br class="">
[0]PETSC ERROR: to get more information on the crash.<br class="">
<br class="">
and so on for each cpu.<br class="">
<br class="">
<br class="">
I tried using valgrind and this is the typical output:<br class="">
<br class="">
==2874== Conditional jump or move depends on uninitialised value(s)<br class="">
==2874== at 0x4018178: index (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x400752D: expand_dynamic_string_token (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x4008009: _dl_map_object (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x40013E4: map_doit (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x400EA53: _dl_catch_error (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x4000ABE: do_preload (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x4000EC0: handle_ld_preload (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x40034F0: dl_main (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x4016274: _dl_sysdep_start (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x4004A99: _dl_start (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x40011F7: ??? (in /lib64/ld-2.22.so)<br class="">
==2874== by 0x12: ???<br class="">
==2874== <br class="">
<br class="">
<br class="">
These are my configuration options. Identical for both petsc 3.9.4 and 3.12.2:<br class="">
<br class="">
./configure --with-scalar-type=complex --download-mumps --download-parmetis --download-metis --download-scalapack=1 --download-fblaslapack=1 --with-debugging=0 --download-superlu_dist=1 --download-ptscotch=1 CXXOPTFLAGS='-O3 -march=native' FOPTFLAGS='-O3 -march=native'
COPTFLAGS='-O3 -march=native'<br class="">
<br class="">
<br class="">
Thanks in advance for any comments or ideas!<br class="">
<br class="">
Cheers,<br class="">
Santiago<br class="">
<br class="">
<br class="">
-- <br class="">
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br class="">
-- Norbert Wiener<br class="">
<br class="">
https://www.cse.buffalo.edu/~knepley/<br class="">
<test1.e6034496><valgrind.log.23361><br class="">
</blockquote>
<br class="">
</blockquote>
<br class="">
<span id="cid:f_k57691hz0"><justfine.log></span><span id="cid:f_k57691ik1"><toobig.log></span><br class="">
</blockquote>
<br class="">
</div>
</div>
</body>
</html>