<div dir="ltr"><div dir="ltr">On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <<a href="mailto:sayosale@hotmail.com">sayosale@hotmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div>
<div id="gmail-m_3253114932700099937appendonsend"></div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Dear Matthew and Jose,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Apologies for the delayed reply, I had a couple of unforeseen days off this week.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
to solve linear systems (the code is using a distributed MPI matrix to solve the generalised
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
non-Hermitian complex problem).</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I have tried the gdb debugger as per Matthew's suggestion.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') ,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
but after some googling I found a tip suggesting the command;</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span><code>echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope</code></span><br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
which seemed to get it working.<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<b>I then first ran the debugger on the small matrix case that worked.</b></div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
That stopped in gdb almost immediately after starting execution <br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
with a report regarding 'nanosleep.c':<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="color:rgb(12,100,192)">../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.</span></div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
However, issuing the 'cont' command again caused the program to run through to the end of the</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
execution w/out any problems, and with correct looking results, so I am guessing this error</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
is not particularly important.</div></div></div></blockquote><div><br></div><div>We do that on purpose when the debugger starts up. Typing 'cont' is correct.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<b>I then tried the same debugging procedure on the large matrix case that fails.</b></div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
The code again stopped almost immediately after the start of execution with <br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
the same nanosleep error as before, and I was able to set the program running <br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
again with 'cont' (see full output below). I was running the code with 4 MPI processes,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
and so had 4 gdb windows appear. Thereafter the code ran for sometime until completing the
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
matrix construction, and then one of the gdb process windows printed a <br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="color:rgb(12,100,192)">Program terminated with signal SIGKILL, Killed.</span><br>
<span style="color:rgb(12,100,192)">The program no longer exists.</span><br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
message. I then typed 'where' into this terminal but just received the message</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="color:rgb(12,100,192)">No stack.</span></div></div></div></blockquote><div><br></div><div>I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays</div><div>on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed</div><div>the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you</div><div>have a large structure on the stack?</div><div><br></div><div>Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you</div><div>the prompt and then "where".</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
The other gdb windows basically seemed to be left in limbo until I issued the 'quit'</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
command in the SIGKILL, and then they vanished.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I paste the full output from the gdb window that recorded the SIGKILL below here.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I guess it is necessary to somehow work out where the SIGKILL originates from ?</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks once again,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Dan.<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="color:rgb(23,78,134)">GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2</span>
<div><span style="color:rgb(23,78,134)">Copyright (C) 2020 Free Software Foundation, Inc.</span></div>
<div><span style="color:rgb(23,78,134)">License GPLv3+: GNU GPL version 3 or later <<a href="http://gnu.org/licenses/gpl.html" target="_blank">http://gnu.org/licenses/gpl.html</a>></span></div>
<div><span style="color:rgb(23,78,134)">This is free software: you are free to change and redistribute it.</span></div>
<div><span style="color:rgb(23,78,134)">There is NO WARRANTY, to the extent permitted by law.</span></div>
<div><span style="color:rgb(23,78,134)">Type "show copying" and "show warranty" for details.</span></div>
<div><span style="color:rgb(23,78,134)">This GDB was configured as "x86_64-linux-gnu".</span></div>
<div><span style="color:rgb(23,78,134)">Type "show configuration" for configuration details.</span></div>
<div><span style="color:rgb(23,78,134)">For bug reporting instructions, please see:</span></div>
<div><span style="color:rgb(23,78,134)"><<a href="http://www.gnu.org/software/gdb/bugs/" target="_blank">http://www.gnu.org/software/gdb/bugs/</a>>.</span></div>
<div><span style="color:rgb(23,78,134)">Find the GDB manual and other documentation resources online at:</span></div>
<div><span style="color:rgb(23,78,134)"> <<a href="http://www.gnu.org/software/gdb/documentation/" target="_blank">http://www.gnu.org/software/gdb/documentation/</a>>.</span></div>
<div><br>
</div>
<div><span style="color:rgb(23,78,134)">For help, type "help".</span></div>
<div><span style="color:rgb(23,78,134)">Type "apropos word" to search for commands related to "word"...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from ./stab1.exe...</span></div>
<div><span style="color:rgb(23,78,134)">Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for more, q to quit, c to continue without paging--cont</span></div>
<div><span style="color:rgb(23,78,134)">/intel64_lin/libmkl_intel_lp64.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...</span></div>
<div><span style="color:rgb(23,78,134)">[Thread debugging using libthread_db enabled]</span></div>
<div><span style="color:rgb(23,78,134)">Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /lib64/ld-linux-x86-64.so.2...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)</span></div>
<div><span style="color:rgb(23,78,134)">Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...</span></div>
<div><span style="color:rgb(23,78,134)">(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)</span></div>
<div><span style="color:rgb(23,78,134)">0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffdc641a9a0, rem=rem@entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78</span></div>
<div><span style="color:rgb(23,78,134)">78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.</span></div>
<div><span style="color:rgb(23,78,134)">(gdb) cont</span></div>
<div><span style="color:rgb(23,78,134)">Continuing.</span></div>
<div><span style="color:rgb(23,78,134)">[New Thread 0x7f9e49c02780 (LWP 676559)]</span></div>
<div><span style="color:rgb(23,78,134)">[New Thread 0x7f9e49400800 (LWP 676560)]</span></div>
<div><span style="color:rgb(23,78,134)">[New Thread 0x7f9e48bfe880 (LWP 676562)]</span></div>
<div><span style="color:rgb(23,78,134)">[Thread 0x7f9e48bfe880 (LWP 676562) exited]</span></div>
<div><span style="color:rgb(23,78,134)">[Thread 0x7f9e49400800 (LWP 676560) exited]</span></div>
<div><span style="color:rgb(23,78,134)">[Thread 0x7f9e49c02780 (LWP 676559) exited]</span></div>
<div><br>
</div>
<div><span style="color:rgb(23,78,134)">Program terminated with signal SIGKILL, Killed.</span></div>
<div><span style="color:rgb(23,78,134)">The program no longer exists.</span></div>
<div><span style="color:rgb(23,78,134)">(gdb) where</span></div>
<span style="color:rgb(23,78,134)">No stack.</span><br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
<br>
</div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_3253114932700099937divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>><br>
<b>Sent:</b> Friday, August 20, 2021 2:12 PM<br>
<b>To:</b> dazza simplythebest <<a href="mailto:sayosale@hotmail.com" target="_blank">sayosale@hotmail.com</a>><br>
<b>Cc:</b> Jose E. Roman <<a href="mailto:jroman@dsic.upv.es" target="_blank">jroman@dsic.upv.es</a>>; PETSc <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] Improving efficiency of slepc usage</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div dir="ltr">On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <<a href="mailto:sayosale@hotmail.com" target="_blank">sayosale@hotmail.com</a>> wrote:<br>
</div>
<div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Dear Jose,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Many thanks for your response, I have been investigating this issue with a few more calculations
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
today, hence the slightly delayed response.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
larger matrix but to lower accuracy. <br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<u><b>Results</b></u></div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<b>Small matrix (N= 21168) - everything good!</b><br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10,<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
very impressively in a single iteration ! Interestingly it did this both for a non-zero -eps_target<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
and also for a zero -eps_target.<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div>
<div id="gmail-m_3253114932700099937x_gmail-m_-6879576846774134416appendonsend"></div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<b>Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert
</b><br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I have just double checked again that the code does run properly when we use the -eps_largest_real
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300)</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case)</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
with expectations.<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<i>Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case
</i>,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
whether or not I use a non-zero -eps_target. For reference this is the command line used :</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
-eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
To be precise the code crashes soon after calling EPSSolve (it successfully calls
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and EPSSetFromOptions).</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
as soon as EPSsolve is called.</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Hi Dan,</div>
<div><br>
</div>
<div>It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with</div>
<div><br>
</div>
<div> -start_in_debugger</div>
<div><br>
</div>
<div>which should launch the debugger (usually), and then type</div>
<div><br>
</div>
<div> cont</div>
<div><br>
</div>
<div>to continue, and then</div>
<div><br>
</div>
<div> where</div>
<div><br>
</div>
<div>to get the stack trace when it crashes, or 'bt' on lldb.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
-eps_largest_real ? The fact that the program works and produces correct results <br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
when using the -eps_largest_real option suggests that there is probably nothing wrong with the specification
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
about 36 GB of the 148GB available on this machine - does the shift invert approach require substantially
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
more memory for example ?<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
get that working for the full-size problem.<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div><br>
</div>
<div> Many thanks and best wishes,</div>
<div> Dan.<br>
</div>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_3253114932700099937x_gmail-m_-6879576846774134416divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Jose E. Roman <<a href="mailto:jroman@dsic.upv.es" target="_blank">jroman@dsic.upv.es</a>><br>
<b>Sent:</b> Thursday, August 19, 2021 7:58 AM<br>
<b>To:</b> dazza simplythebest <<a href="mailto:sayosale@hotmail.com" target="_blank">sayosale@hotmail.com</a>><br>
<b>Cc:</b> PETSc <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] Improving efficiency of slepc usage</font>
<div> </div>
</div>
<div><font size="2"><span style="font-size:11pt">
<div>In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to
the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.<br>
<br>
In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1<br>
<br>
Jose<br>
<br>
<br>
> El 19 ago 2021, a las 7:12, dazza simplythebest <<a href="mailto:sayosale@hotmail.com" target="_blank">sayosale@hotmail.com</a>> escribió:<br>
> <br>
> Dear All,<br>
> I am planning on using slepc to do a large number of eigenvalue calculations<br>
> of a generalized eigenvalue problem, called from a program written in fortran using MPI.<br>
> Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,<br>
> and on smaller test problems everything is working well; the matrices are efficiently and
<br>
> correctly constructed and slepc returns the correct spectrum. I am just now starting to move<br>
> towards now solving the full-size 'production run' problems, and would appreciate some
<br>
> general advice on how to improve the solver's performance.<br>
> <br>
> In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices
<br>
> are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are
<br>
> complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part,
<br>
> although in other cases I will also be interested in finding the eigenvalues whose real part
<br>
> is close to zero.<br>
> <br>
> A)<br>
> Calling slepc 's EPS solver with the following options:<br>
> <br>
> -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt<br>
> <br>
> <br>
> led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations
<br>
> (examining the monitor output it did appear to be very slowly approaching convergence).<br>
> <br>
> B)<br>
> On the same problem I have also tried a shift-invert transformation using the options<br>
> <br>
> -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert<br>
> <br>
> -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?<br>
> <br>
> <br>
> Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?<br>
> In the case of A) I can see from watching the slepc videos that increasing ncv
<br>
> may help, but I am wondering , since 600 is a large number of iterations, whether there
<br>
> maybe something else going on - e.g. perhaps some alternative preconditioner may help ?<br>
> In the case of B), I guess there must be some mistake in these command line options?<br>
> Again, any advice will be greatly appreciated.<br>
> Best wishes, Dan.<br>
<br>
</div>
</span></font></div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>