<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p style="margin-top:0;margin-bottom:0">Hong, </p>
<p style="margin-top:0;margin-bottom:0"> The user's example code reads a matrix, calls KSPSolve, then over. From his log_view file, I saw long <span>MatLUFactorNum time and short MatSolve time. Now I know that is because <span style="font-family: Calibri, Helvetica, sans-serif, Helvetica, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 16px;">MatSolve
was skipped. Thanks.</span></span></p>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Zhang, Hong<br>
<b>Sent:</b> Thursday, October 11, 2018 10:07:10 AM<br>
<b>To:</b> Zhang, Junchao<br>
<b>Cc:</b> Smith, Barry F.; For users of the development version of PETSc<br>
<b>Subject:</b> Re: [petsc-dev] MUMPS silent errors</font>
<div> </div>
</div>
<meta content="text/html; charset=utf-8">
<div>
<div dir="ltr">
<div class="x_gmail_quote">
<div dir="ltr">Junchao :<br>
</div>
<div>When matrix factorization fails, we diver error message back to user and skip MatSolve. Can you reproduce this problem and I'll take a look at it?</div>
<div><br>
</div>
<blockquote class="x_gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">
<div dir="ltr">
<div id="x_m_-3310136646173625179divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">
<p style="margin-top:0; margin-bottom:0">What is embarrassing is the user sent me beautiful -log_view outputs and began doing performance comparison. The whole thing is meaningless only because he forgot to check the converged reason on a direct solver.</p>
</div>
</div>
</blockquote>
<div> </div>
<div>When linear solver fails, snes/ts also fails, which should display error output to user. User should check the accuracy of his final solution with '-snes_converged_reason' before looking at performance.</div>
<blockquote class="x_gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">
<div dir="ltr">
<div id="x_m_-3310136646173625179divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0">MUMPS manual has "<span>A call to MUMPS with JOB=2 must be preceded by a call with JOB=1 on the same instance</span>", and similar languages for other phases. It implies we at least should not call MatSolve_MUMPS with
failed factorization since it might crash the code.</p>
</div>
</div>
</blockquote>
<div>Yes. I've never seen this happen before, thus want to check.</div>
<div>Hong </div>
<blockquote class="x_gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">
<div dir="ltr">
<div id="x_m_-3310136646173625179divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">
</div>
<hr style="display:inline-block; width:98%">
<div id="x_m_-3310136646173625179divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Smith, Barry F.<br>
<b>Sent:</b> Wednesday, October 10, 2018 6:41:20 PM<br>
<b>To:</b> Zhang, Junchao<br>
<b>Cc:</b> petsc-dev<br>
<b>Subject:</b> Re: [petsc-dev] MUMPS silent errors</font>
<div> </div>
</div>
<div class="x_m_-3310136646173625179BodyFragment"><font size="2"><span style="font-size:11pt">
<div class="x_m_-3310136646173625179PlainText"><br>
I looked at the code and it is handled in the PETSc way. The user should not expect KSP to error just because it was unable to solve a linear system; they should be calling KSPGetConvergedReason() after KSPSolve() to check that the solution was computed successfully.<br>
<br>
Barry<br>
<br>
<br>
> On Oct 10, 2018, at 2:12 PM, Zhang, Junchao <<a href="mailto:jczhang@mcs.anl.gov" target="_blank">jczhang@mcs.anl.gov</a>> wrote:<br>
> <br>
> I met a case where MUMPS numeric factorization returned an error code -9 in mumps->id.INFOG(1) but A->erroriffailure was false in the following code in mumps.c<br>
> 1199: PetscErrorCode MatFactorNumeric_MUMPS(Mat F,Mat A,const MatFactorInfo *info)<br>
> 1200: <br>
> {<br>
> ...<br>
> <br>
> 1227: PetscMUMPS_c(mumps);<br>
> 1228: if<br>
> (mumps->id.INFOG(1) < 0) {<br>
> <br>
> 1229: if<br>
> (A->erroriffailure) {<br>
> <br>
> 1230: SETERRQ2(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error reported by MUMPS in numerical factorization phase: INFOG(1)=%d, INFO(2)=%d\n"<br>
> ,mumps->id.INFOG(1),mumps->id.INFO(2));<br>
> <br>
> 1231: } else<br>
> {<br>
> <br>
> 1232: if (mumps->id.INFOG(1) == -10) { /* numerically singular matrix */<br>
> 1233: PetscInfo2(F,"matrix is numerically singular, INFOG(1)=%d, INFO(2)=%d\n"<br>
> ,mumps->id.INFOG(1),mumps->id.INFO(2));<br>
> <br>
> 1234: <br>
> F->factorerrortype = MAT_FACTOR_NUMERIC_ZEROPIVOT;<br>
> <br>
> <br>
> The code continued to KSPSolve and finished successfully (with wrong answer). The user did not call KSPGetConvergedReason() after KSPSolve. I found I had to either add -ksp_error_if_not_converged or call KSPSetErrorIfNotConverged(ksp,PETSC_TRUE) to make
the code fail. <br>
> Is it expected? In my view, it is dangerous. If MUMPS fails in one stage, PETSc should not proceed to the next stage because it may hang there.<br>
<br>
</div>
</span></font></div>
</div>
</blockquote>
</div>
</div>
</div>
</body>
</html>