<div dir="ltr">To be clear: these differences completely go away with MUMPS?<div><br></div><div>Can you valgrind this? We have seen some valgrind warning from MUMPS from BLAS routines. It could be that your BLAS is buggy (and SuperLU uses some BLAS routines that MUMPS does not). I think SuperLU does more/different pivoting than MUMPS.  What BLAS are you using? (download, MKL, ...)</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 15, 2017 at 4:52 PM, Smith, Barry F. <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

<br>

> On Nov 15, 2017, at 3:36 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov">fande.kong@inl.gov</a>> wrote:<br>

><br>

> Hi Barry,<br>

><br>

> Thanks for your reply. I was wondering why this happens only when we use superlu_dist. I am trying to understand the algorithm in superlu_dist. If we use ASM or MUMPS, we do not produce these differences.<br>

><br>

> The differences actually are NOT meaningless.  In fact, we have a real transient application that presents this issue.   When we run the simulation with superlu_dist in parallel for thousands of time steps, the final physics  solution looks totally different from different runs. The differences are not acceptable any more.  For a steady problem, the difference may be meaningless. But it is significant for the transient problem.<br>

<br>

</span>  I submit that the "physics solution" of all of these runs is equally right and equally wrong. If the solutions are very different due to a small perturbation than something is wrong with the model or the integrator, I don't think you can blame the linear solver (see below)<br>

<span class="">><br>

> This makes the solution not reproducible, and we can not even set a targeting solution in the test system because the solution is so different from one run to another.   I guess there might/may be a tiny bug in superlu_dist or the PETSc interface to superlu_dist.<br>

<br>

</span>  This is possible but it is also possible this is due to normal round off inside of SuperLU dist.<br>

<br>

   Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't really matter exactly how well SuperLU_Dist does. The nonlinear iteration does essential defect correction for you; are you making sure that the nonlinear iteration always works for every timestep? For example confirm that SNESGetConvergedReason() is always positive.<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

><br>

><br>

> Fande,<br>

><br>

><br>

><br>

><br>

> On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>

><br>

>   Meaningless differences<br>

><br>

><br>

> > On Nov 15, 2017, at 2:26 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov">fande.kong@inl.gov</a>> wrote:<br>

> ><br>

> > Hi,<br>

> ><br>

> > There is a heat conduction problem. When superlu_dist is used as a preconditioner, we have random results from different runs. Is there a random algorithm in superlu_dist? If we use ASM or MUMPS as the preconditioner, we then don't have this issue.<br>

> ><br>

> > run 1:<br>

> ><br>

> >  0 Nonlinear |R| = 9.447423e+03<br>

> >       0 Linear |R| = 9.447423e+03<br>

> >       1 Linear |R| = 1.013384e-02<br>

> >       2 Linear |R| = 4.020995e-08<br>

> >  1 Nonlinear |R| = 1.404678e-02<br>

> >       0 Linear |R| = 1.404678e-02<br>

> >       1 Linear |R| = 5.104757e-08<br>

> >       2 Linear |R| = 7.699637e-14<br>

> >  2 Nonlinear |R| = 5.106418e-08<br>

> ><br>

> ><br>

> > run 2:<br>

> ><br>

> >  0 Nonlinear |R| = 9.447423e+03<br>

> >       0 Linear |R| = 9.447423e+03<br>

> >       1 Linear |R| = 1.013384e-02<br>

> >       2 Linear |R| = 4.020995e-08<br>

> >  1 Nonlinear |R| = 1.404678e-02<br>

> >       0 Linear |R| = 1.404678e-02<br>

> >       1 Linear |R| = 5.109913e-08<br>

> >       2 Linear |R| = 7.189091e-14<br>

> >  2 Nonlinear |R| = 5.111591e-08<br>

> ><br>

> > run 3:<br>

> ><br>

> >  0 Nonlinear |R| = 9.447423e+03<br>

> >       0 Linear |R| = 9.447423e+03<br>

> >       1 Linear |R| = 1.013384e-02<br>

> >       2 Linear |R| = 4.020995e-08<br>

> >  1 Nonlinear |R| = 1.404678e-02<br>

> >       0 Linear |R| = 1.404678e-02<br>

> >       1 Linear |R| = 5.104942e-08<br>

> >       2 Linear |R| = 7.465572e-14<br>

> >  2 Nonlinear |R| = 5.106642e-08<br>

> ><br>

> > run 4:<br>

> ><br>

> >  0 Nonlinear |R| = 9.447423e+03<br>

> >       0 Linear |R| = 9.447423e+03<br>

> >       1 Linear |R| = 1.013384e-02<br>

> >       2 Linear |R| = 4.020995e-08<br>

> >  1 Nonlinear |R| = 1.404678e-02<br>

> >       0 Linear |R| = 1.404678e-02<br>

> >       1 Linear |R| = 5.102730e-08<br>

> >       2 Linear |R| = 7.132220e-14<br>

> >  2 Nonlinear |R| = 5.104442e-08<br>

> ><br>

> > Solver details:<br>

> ><br>

> > SNES Object: 8 MPI processes<br>

> >   type: newtonls<br>

> >   maximum iterations=15, maximum function evaluations=10000<br>

> >   tolerances: relative=1e-08, absolute=1e-11, solution=1e-50<br>

> >   total number of linear solver iterations=4<br>

> >   total number of function evaluations=7<br>

> >   norm schedule ALWAYS<br>

> >   SNESLineSearch Object: 8 MPI processes<br>

> >     type: basic<br>

> >     maxstep=1.000000e+08, minlambda=1.000000e-12<br>

> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08<br>

> >     maximum iterations=40<br>

> >   KSP Object: 8 MPI processes<br>

> >     type: gmres<br>

> >       restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement<br>

> >       happy breakdown tolerance 1e-30<br>

> >     maximum iterations=100, initial guess is zero<br>

> >     tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.<br>

> >     right preconditioning<br>

> >     using UNPRECONDITIONED norm type for convergence test<br>

> >   PC Object: 8 MPI processes<br>

> >     type: lu<br>

> >       out-of-place factorization<br>

> >       tolerance for zero pivot 2.22045e-14<br>

> >       matrix ordering: natural<br>

> >       factor fill ratio given 0., needed 0.<br>

> >         Factored matrix follows:<br>

> >           Mat Object: 8 MPI processes<br>

> >             type: superlu_dist<br>

> >             rows=7925, cols=7925<br>

> >             package used to perform factorization: superlu_dist<br>

> >             total: nonzeros=0, allocated nonzeros=0<br>

> >             total number of mallocs used during MatSetValues calls =0<br>

> >               SuperLU_DIST run parameters:<br>

> >                 Process grid nprow 4 x npcol 2<br>

> >                 Equilibrate matrix TRUE<br>

> >                 Matrix input mode 1<br>

> >                 Replace tiny pivots FALSE<br>

> >                 Use iterative refinement TRUE<br>

> >                 Processors in row 4 col partition 2<br>

> >                 Row permutation LargeDiag<br>

> >                 Column permutation METIS_AT_PLUS_A<br>

> >                 Parallel symbolic factorization FALSE<br>

> >                 Repeated factorization SamePattern<br>

> >     linear system matrix followed by preconditioner matrix:<br>

> >     Mat Object: 8 MPI processes<br>

> >       type: mffd<br>

> >       rows=7925, cols=7925<br>

> >         Matrix-free approximation:<br>

> >           err=1.49012e-08 (relative error in function evaluation)<br>

> >           Using wp compute h routine<br>

> >               Does not compute normU<br>

> >     Mat Object: () 8 MPI processes<br>

> >       type: mpiaij<br>

> >       rows=7925, cols=7925<br>

> >       total: nonzeros=63587, allocated nonzeros=63865<br>

> >       total number of mallocs used during MatSetValues calls =0<br>

> >         not using I-node (on process 0) routines<br>

> ><br>

> ><br>

> > Fande,<br>

> ><br>

> ><br>

><br>

><br>

<br>

</div></div></blockquote></div><br></div>