<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>
<p><span style="font-size:11pt"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><span style="font-size:12.0pt"><span style="font-family:"Times New Roman",serif">Thanks for the swift reply. </span></span></span></span></span></p>
<p><span style="font-size:11pt"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><span style="font-size:12.0pt"><span style="font-family:"Times New Roman",serif">I also realized if I reduce the number of RHS then it works. But I am running the code on a cluster with 256GB ram / node. One dense matrix would be around ~30 Gb so 60 Gb, which is large but does exceed the memory of even one node and I also get the seg fault if I run it on several nodes. Moreover, it works well with MUMPS and MKL_CPARDISO solver. The maxium memory used when using MUMPS is around 150 Gb during the solver phase but for SuperLU_dist it crashed even before reaching the solver phase. Could there be such a large difference in memory usage between SuperLu_dist and MUMPS ?</span></span></span></span></span></p>
<p> </p>
<p><span style="font-size:11pt"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><span style="font-size:12.0pt"><span style="font-family:"Times New Roman",serif">best,</span></span></span></span></span></p>
<p><span style="font-size:11pt"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><span style="font-size:12.0pt"><span style="font-family:"Times New Roman",serif">marius</span></span></span></span></span></p>
<div>
<div name="quote" style="margin:10px 5px 5px 10px; padding: 10px 0 10px 10px; border-left:2px solid #C3D9E5; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div style="margin:0 0 10px 0;"><b>Gesendet:</b> Donnerstag, 29. Oktober 2020 um 10:10 Uhr<br/>
<b>Von:</b> "Zhang, Hong" <hzhang@mcs.anl.gov><br/>
<b>An:</b> "Marius Buerkle" <mbuerkle@web.de><br/>
<b>Cc:</b> "petsc-users@mcs.anl.gov" <petsc-users@mcs.anl.gov>, "Sherry Li" <xiaoye@nersc.gov><br/>
<b>Betreff:</b> Re: Re: [petsc-users] superlu_dist segfault</div>
<div name="quoted-content"><!--P {
margin-top: 0;
margin-bottom: 0;
}
-->
<div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"><span style="color: rgb(32,31,30);font-family: Verdana;font-size: 12.0px;background-color: rgb(255,255,255);display: inline;">Marius,</span></div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"><span style="color: rgb(32,31,30);font-family: Verdana;font-size: 12.0px;background-color: rgb(255,255,255);display: inline;">I tested your code with petsc-release on my mac laptop using np=2 cores. I first tested a small matrix data file successfully. Then I switch to your data file and run out of memory, likely due to the dense matrices B and X. I got an error "Your system has run out of application memory" from my laptop.</span></div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"> </div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"><span style="color: rgb(32,31,30);font-family: Verdana;font-size: 12.0px;background-color: rgb(255,255,255);display: inline;">The sparse matrix A has size 42549 by 42549. Your code creates dense matrices B and X with the same size -- a huge memory requirement!</span></div>
<div><font color="#201f1e" face="Verdana"><span style="font-size: 12.0px;">By replacing B and X with size <span style="background-color: rgb(255,255,255);display: inline;">42549 by<span> nrhs (nrhs =< 4000), I had the code run well with np=2. Note the error message you got </span></span></span></font></div>
<div><font color="#201f1e" face="Verdana"><span style="font-size: 12.0px;"><span style="background-color: rgb(255,255,255);display: inline;"><span><span style="background-color: rgb(255,255,255);display: inline;">[23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range</span></span></span></span></font></div>
<div> </div>
<div>The modified code I used is attached.</div>
<div><font color="#201f1e" face="Verdana"><span style="font-size: 12.0px;"><span style="background-color: rgb(255,255,255);display: inline;"><span><span style="background-color: rgb(255,255,255);display: inline;">Hong</span></span></span></span></font></div>
<div id="appendonsend"> </div>
<hr style="display: inline-block;width: 98.0%;"/>
<div id="divRplyFwdMsg"><font color="#000000" face="Calibri, sans-serif" style="font-size: 11.0pt;"><b>From:</b> Marius Buerkle <mbuerkle@web.de><br/>
<b>Sent:</b> Tuesday, October 27, 2020 10:01 PM<br/>
<b>To:</b> Zhang, Hong <hzhang@mcs.anl.gov><br/>
<b>Cc:</b> petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>; Sherry Li <xiaoye@nersc.gov><br/>
<b>Subject:</b> Aw: Re: [petsc-users] superlu_dist segfault</font>
<div> </div>
</div>
<div>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>Hi,</div>
<div> </div>
<div>I recompiled PETSC with debug option, now I get a seg fault at a different position</div>
<div> </div>
<div>[23]PETSC ERROR: ------------------------------------------------------------------------<br/>
[23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range<br/>
[23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger<br/>
[23]PETSC ERROR: or see <a href="https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind" target="_blank">https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a><br/>
[23]PETSC ERROR: or try <a href="http://valgrind.org" target="_blank">http://valgrind.org</a> on GNU/linux and Apple Mac OS X to find memory corruption errors<br/>
[23]PETSC ERROR: likely location of problem given in stack below<br/>
[23]PETSC ERROR: --------------------- Stack Frames ------------------------------------<br/>
[23]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,<br/>
[23]PETSC ERROR: INSTEAD the line number of the start of the function<br/>
[23]PETSC ERROR: is given.<br/>
[23]PETSC ERROR: [23] SuperLU_DIST:pzgssvx line 242 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c<br/>
[23]PETSC ERROR: [23] MatMatSolve_SuperLU_DIST line 211 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c<br/>
[23]PETSC ERROR: [23] MatMatSolve line 3466 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/interface/matrix.c<br/>
[23]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br/>
[23]PETSC ERROR: Signal received</div>
<div> </div>
<div>I made a small reproducer. The matrix is a bit too big so I cannot attach it directly to the email, but I put it in the cloud</div>
<div><a href="https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw" target="_blank">https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw</a></div>
<div> </div>
<div>Best,</div>
<div>Marius</div>
<div>
<div>
<div style="margin: 10.0px 5.0px 5.0px 10.0px;padding: 10.0px 0 10.0px 10.0px;border-left: 2.0px solid rgb(195,217,229);">
<div style="margin: 0 0 10.0px 0;"><b>Gesendet:</b> Dienstag, 27. Oktober 2020 um 23:11 Uhr<br/>
<b>Von:</b> "Zhang, Hong" <hzhang@mcs.anl.gov><br/>
<b>An:</b> "Marius Buerkle" <mbuerkle@web.de>, "petsc-users@mcs.anl.gov" <petsc-users@mcs.anl.gov>, "Sherry Li" <xiaoye@nersc.gov><br/>
<b>Betreff:</b> Re: [petsc-users] superlu_dist segfault</div>
<div>
<div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"><span style="font-family: Verdana;font-size: 12.0px;background-color: rgb(255,255,255);display: inline;">Marius,</span></div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"><span style="font-family: Verdana;font-size: 12.0px;background-color: rgb(255,255,255);display: inline;">It fails at the line <span style="background-color: rgb(255,255,255);display: inline;">1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c</span></span></div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"><span style="font-family: Verdana;font-size: 12.0px;background-color: rgb(255,255,255);display: inline;"><span style="background-color: rgb(255,255,255);display: inline;"> if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex)))) ABORT("Malloc fails for lsum[].");</span></span></div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"> </div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"><span style="font-family: Verdana;font-size: 12.0px;background-color: rgb(255,255,255);display: inline;"><span style="background-color: rgb(255,255,255);display: inline;">We do not know what it means. You may use a debugger to check the values of the variables involved.</span></span></div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"><span style="font-family: Verdana;font-size: 12.0px;background-color: rgb(255,255,255);display: inline;"><span style="background-color: rgb(255,255,255);display: inline;">I'm cc'ing Sherry (superlu_dist developer), or you may send us a stand-alone short code that reproduce the error. We can help on its investigation.</span></span></div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"><span style="font-family: Verdana;font-size: 12.0px;background-color: rgb(255,255,255);display: inline;"><span style="background-color: rgb(255,255,255);display: inline;">Hong</span></span></div>
<div id="x_appendonsend"> </div>
<div style="font-family: Calibri , Arial , Helvetica , sans-serif;font-size: 12.0pt;color: rgb(0,0,0);"> </div>
<hr style="display: inline-block;width: 98.0%;"/>
<div id="x_divRplyFwdMsg"><font color="#000000" face="Calibri, sans-serif" style="font-size: 11.0pt;"><b>From:</b> petsc-users <petsc-users-bounces@mcs.anl.gov> on behalf of Marius Buerkle <mbuerkle@web.de><br/>
<b>Sent:</b> Tuesday, October 27, 2020 8:46 AM<br/>
<b>To:</b> petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov><br/>
<b>Subject:</b> [petsc-users] superlu_dist segfault</font>
<div> </div>
</div>
<div>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>Hi,</div>
<div> </div>
<div>When using MatMatSolve with superlu_dist I get a segmentation fault:</div>
<div> </div>
<div>Malloc fails for lsum[]. at line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c</div>
<div> </div>
<div>The matrix size is not particular big and I am using the petsc release branch and superlu_dist is v6.3.0 I think.</div>
<div> </div>
<div>Best,</div>
<div>Marius</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div></div></body></html>