<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">For superlu_dist, you can try:<br><br>options.ReplaceTinyPivot  = NO;   (I think default is YES)<br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">

and/or<br><br>options.IterRefine = YES; <br><br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Sherry Li<br><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Mar 2, 2014 at 2:23 PM, Matt Landreman <span dir="ltr"><<a href="mailto:matt.landreman@gmail.com" target="_blank">matt.landreman@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi,</div><div><br></div><div>I'm having some problems with my PETSc application similar to the ones discussed in this thread, so perhaps one of you can help. In my application I factorize a preconditioner matrix with mumps or superlu_dist, using this factorized preconditioner to accelerate gmres on a matrix that is denser than the preconditioner.  I've been running on edison at nersc.  My program works reliably for problem sizes below about 1 million x 1 million, but above this size, the factorization step fails in one of many possible ways, depending on the compiler, # of nodes, # of procs/node, etc:</div>


<div><br></div><div>When I use superlu_dist, I get 1 of 2 failure modes: </div><div>(1) the first step of KSP returns "0 KSP residual norm -nan" and ksp then returns KSPConvergedReason = -9, or </div><div>(2) the factorization completes, but GMRES then converges excruciatingly slowly or not at all, even if I choose the "real" matrix to be identical to the preconditioner matrix so KSP ought to converge in 1 step (which it does for smaller matrices).</div>


<div><br></div><div>For mumps, the factorization can fail in many different ways:</div><div>(3) With the intel compiler I usually get "Caught signal number 11 SEGV: Segmentation Violation"</div><div>(4) Sometimes with the intel compiler I get "Caught signal number 7 BUS: Bus Error"</div>


<div>(5) With the gnu compiler I often get a bunch of lines like "problem with NIV2_FLOPS message  -5.9604644775390625E-008           0  -227464733.99999997"</div><div>(6) Other times with gnu I get a mumps error with INFO(1)=-9 or INFO(1)=-17. The mumps documentation suggests I should increase icntl(14), but what is an appropriate value? 50? 10000?</div>


<div>(7) With the Cray compiler I consistently get this cryptic error:</div><div><font face="courier new, monospace">Fatal error in PMPI_Test: Invalid MPI_Request, error stack:</font></div><div><font face="courier new, monospace">PMPI_Test(166): MPI_Test(request=0xb228dbf3c, flag=0x7ffffffe097c, status=0x7ffffffe0a00) failed</font></div>


<div><font face="courier new, monospace">PMPI_Test(121): Invalid MPI_Request</font></div><div><font face="courier new, monospace">_pmiu_daemon(SIGCHLD): [NID 02784] [c6-1c1s8n0] [Sun Mar  2 10:35:20 2014] PE RANK 0 exit signal Aborted</font></div>


<div><font face="courier new, monospace">[NID 02784] 2014-03-02 10:35:20 Apid 3374579: initiated application termination</font></div><div><font face="courier new, monospace">Application 3374579 exit codes: 134</font></div>


<div><br></div><div>For linear systems smaller than around 1 million^2, my application is very robust, working consistently with both mumps & superlu_dist, working for a wide range of # of nodes and # of procs/node, and working with all 3 available compilers on edison (intel, gnu, cray).</div>


<div><br></div><div>By the way, mumps failed for much smaller problems until I tried -mat_mumps_icntl_7 2 (inspired by your conversation last week). I tried all the other options for icntl(7), icntl(28), and icntl(29), finding icntl(7)=2 works best by far.  I tried the flags that worked for Samar (-mat_superlu_dist_colperm PARMETIS -mat_superlu_dist_parsymbfact 1) with superlu_dist, but they did not appear to change anything in my case.</div>


<div><br></div><div>Can you recommend any other parameters of petsc, superlu_dist, or mumps that I should try changing?  I don't care in the end whether I use superlu_dist or mumps.  </div><div><br></div><div>Thanks!</div>


<div><br></div><div>Matt Landreman</div></div><div class="gmail_extra"><br><br><div class="gmail_quote"><div class="">On Tue, Feb 25, 2014 at 3:50 PM, Xiaoye S. Li <span dir="ltr"><<a href="mailto:xsli@lbl.gov" target="_blank">xsli@lbl.gov</a>></span> wrote:<br>


</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class=""><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Very good!  Thanks for the update. <br>


</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">I guess you are using all 16 cores per node?  Since superlu_dist currently is MPI-only, if you generate 16 MPI tasks, serial symbolic factorization only has less than 2 GB memory to work with. <br>


<br></div></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Sherry<br></div></div><div><div class="h5"><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Feb 25, 2014 at 12:22 PM, Samar Khatiwala <span dir="ltr"><<a href="mailto:spk@ldeo.columbia.edu" target="_blank">spk@ldeo.columbia.edu</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi Sherry,<div><br></div><div>Thanks! I tried your suggestions and it worked!</div><div>


<br></div><div>For the record I added these flags: -mat_superlu_dist_colperm PARMETIS -mat_superlu_dist_parsymbfact 1 </div><div><br></div><div>Also, for completeness and since you asked:</div><div><br></div><div><div>size: 2346346 x 2346346</div>


<div>nnz:  60856894</div><div>unsymmetric</div><div><br></div><div>The hardware (<a href="http://www2.cisl.ucar.edu/resources/yellowstone/hardware" target="_blank">http://www2.cisl.ucar.edu/resources/yellowstone/hardware</a>) specs are: 2 GB/core, 32 GB/node (27 GB usable), (16 cores per node)</div>


<div>I've been running on 8 nodes (so 8 x 27 ~ 216 GB).</div><div><div><br></div><div>Thanks again for your help!</div><div><br></div><div>Samar</div></div></div><div><div><div><br></div><div><div>

On Feb 25, 2014, at 1:00 PM, "Xiaoye S. Li" <<a href="mailto:xsli@lbl.gov" target="_blank">xsli@lbl.gov</a>> wrote:</div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">


I didn't follow the discussion thread closely ... How large is your matrix dimension, and number of nonzeros?<br>How large is the memory per core (or per node)?  <br>

<br>The default setting in superlu_dist is to use serial symbolic factorization. You can turn on parallel symbolic factorization by:<br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">


options.ParSymbFact = YES;<br>

options.ColPerm = PARMETIS;<br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Is your matrix symmetric?  if so, you need to give both upper and lower half of matrix A to superlu, which doesn't exploit symmetry.<br>


<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Do you know whether you need numerical pivoting?  If not, you can turn off pivoting by:<br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">


options.RowPerm = NATURAL;<br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">This avoids some other serial bottleneck.<br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">


All these options can be turned on in the petsc interface. Please check out the syntax there.<br><br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Sherry<br><br></div></div><div class="gmail_extra">


<br><br><div class="gmail_quote">On Tue, Feb 25, 2014 at 8:07 AM, Samar Khatiwala <span dir="ltr"><<a href="mailto:spk@ldeo.columbia.edu" target="_blank">spk@ldeo.columbia.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hi Barry,<br>

<br>

You're probably right. I note that the error occurs almost instantly and I've tried increasing the number of CPUs<br>

(as many as ~1000 on Yellowstone) to no avail. I know this is a big problem but I didn't think it was that big!<br>

<br>

Sherry: Is there any way to write out more diagnostic info? E.g.,how much memory superlu thinks it needs/is attempting<br>

to allocate.<br>

<br>

Thanks,<br>

<br>

Samar<br>

<div><div><br>

On Feb 25, 2014, at 10:57 AM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

><br>

>><br>

>> I tried superlu_dist again and it crashes even more quickly than MUMPS with just the following error:<br>

>><br>

>> ERROR: 0031-250  task 128: Killed<br>

><br>

>   This is usually a symptom of running out of memory.<br>

><br>

>><br>

>> Absolutely nothing else is written out to either stderr or stdout. This is with -mat_superlu_dist_statprint.<br>

>> The program works fine on a smaller matrix.<br>

>><br>

>> This is the sequence of calls:<br>

>><br>

>> KSPSetType(ksp,KSPPREONLY);<br>

>> PCSetType(pc,PCLU);<br>

>> PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST);<br>

>> KSPSetFromOptions(ksp);<br>

>> PCSetFromOptions(pc);<br>

>> KSPSolve(ksp,b,x);<br>

>><br>

>> All of these successfully return *except* the very last one to KSPSolve.<br>

>><br>

>> Any help would be appreciated. Thanks!<br>

>><br>

>> Samar<br>

>><br>

>> On Feb 24, 2014, at 3:58 PM, Xiaoye S. Li <<a href="mailto:xsli@lbl.gov" target="_blank">xsli@lbl.gov</a>> wrote:<br>

>><br>

>>> Samar:<br>

>>> If you include the error message while crashing using superlu_dist, I probably know the reason.  (better yet, include the printout before the crash. )<br>

>>><br>

>>> Sherry<br>

>>><br>

>>><br>

>>> On Mon, Feb 24, 2014 at 9:56 AM, Hong Zhang <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>> wrote:<br>

>>> Samar :<br>

>>> There are limitations for direct solvers.<br>

>>> Do not expect any solver can be used on arbitrarily large problems.<br>

>>> Since superlu_dist also crashes, direct solvers may not be able to work on your application.<br>

>>> This is why I suggest to increase size incrementally.<br>

>>> You may have to experiment other type of solvers.<br>

>>><br>

>>> Hong<br>

>>><br>

>>> Hi Hong and Jed,<br>

>>><br>

>>> Many thanks for replying. It would indeed be nice if the error messages from MUMPS were less cryptic!<br>

>>><br>

>>> 1) I have tried smaller matrices although given how my problem is set up a jump is difficult to avoid. But a good idea<br>

>>> that I will try.<br>

>>><br>

>>> 2) I did try various ordering but not the one you suggested.<br>

>>><br>

>>> 3) Tracing the error through the MUMPS code suggest a rather abrupt termination of the program (there should be more<br>

>>> error messages if, for example, memory was a problem). I therefore thought it might be an interface problem rather than<br>

>>> one with mumps and turned to the petsc-users group first.<br>

>>><br>

>>> 4) I've tried superlu_dist but it also crashes (also unclear as to why) at which point I decided to try mumps. The fact that both<br>

>>> crash would again indicate a common (memory?) problem.<br>

>>><br>

>>> I'll try a few more things before asking the MUMPS developers.<br>

>>><br>

>>> Thanks again for your help!<br>

>>><br>

>>> Samar<br>

>>><br>

>>> On Feb 24, 2014, at 11:47 AM, Hong Zhang <<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>> wrote:<br>

>>><br>

>>>> Samar:<br>

>>>> The crash occurs in<br>

>>>> ...<br>

>>>> [161]PETSC ERROR: Error in external library!<br>

>>>> [161]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFO(1)=-1, INFO(2)=48<br>

>>>><br>

>>>> for very large matrix, likely memory problem as you suspected.<br>

>>>> I would suggest<br>

>>>> 1. run problems with increased sizes (not jump from a small one to a very large one) and observe memory usage using<br>

>>>> '-ksp_view'.<br>

>>>>   I see you use '-mat_mumps_icntl_14 1000', i.e., percentage of estimated workspace increase. Is it too large?<br>

>>>>   Anyway, this input should not cause the crash, I guess.<br>

>>>> 2. experimenting with different matrix ordering -mat_mumps_icntl_7 <> (I usually use sequential ordering 2)<br>

>>>>    I see you use parallel ordering -mat_mumps_icntl_29 2.<br>

>>>> 3. send bug report to mumps developers for their suggestion.<br>

>>>><br>

>>>> 4. try other direct solvers, e.g., superlu_dist.<br>

>>>><br>

>>>> …<br>

>>>><br>

>>>> etc etc. The above error I can tell has something to do with processor 48 (INFO(2)) and so forth but not the previous one.<br>

>>>><br>

>>>> The full output enabled with -mat_mumps_icntl_4 3 looks as in the attached file. Any hints as to what could be giving this<br>

>>>> error would be very much appreciated.<br>

>>>><br>

>>>> I do not know how to interpret this  output file. mumps developer would give you better suggestion on it.<br>

>>>> I would appreciate to learn as well :-)<br>

>>>><br>

>>>> Hong<br>

>>><br>

>>><br>

>>><br>

>><br>

><br>

<br>

</div></div></blockquote></div><br></div>

</blockquote></div><br></div></div></div></blockquote></div><br></div>

</div></div></div></div></blockquote></div><br></div>

</blockquote></div><br></div>