<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Nov 15, 2017 at 4:36 PM, Kong, Fande <span dir="ltr"><<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div>Hi Barry,<br><br></div>Thanks for your reply. I was wondering why this happens only when we use superlu_dist. I am trying to understand the algorithm in superlu_dist. If we use ASM or MUMPS, we do not produce these differences. <br><br></div>The differences actually are NOT meaningless. In fact, we have a real transient application that presents this issue. When we run the simulation with superlu_dist in parallel for thousands of time steps, the final physics solution looks totally different from different runs. The differences are not acceptable any more. For a steady problem, the difference may be meaningless. But it is significant for the transient problem. <br></div></div></div></blockquote><div><br></div><div>Are you sure this formulation is stable? It does not seem like it.</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div></div>This makes the solution not reproducible, and we can not even set a targeting solution in the test system because the solution is so different from one run to another. I guess there might/may be a tiny bug in superlu_dist or the PETSc interface to superlu_dist.<span class="HOEnZb"><font color="#888888"><br><br><br></font></span></div><span class="HOEnZb"><font color="#888888">Fande,<br><div><br><br><div><div><br></div></div></div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Meaningless differences<br>
<div class="m_-6513361910201159489HOEnZb"><div class="m_-6513361910201159489h5"><br>
<br>
> On Nov 15, 2017, at 2:26 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>> wrote:<br>
><br>
> Hi,<br>
><br>
> There is a heat conduction problem. When superlu_dist is used as a preconditioner, we have random results from different runs. Is there a random algorithm in superlu_dist? If we use ASM or MUMPS as the preconditioner, we then don't have this issue.<br>
><br>
> run 1:<br>
><br>
> 0 Nonlinear |R| = 9.447423e+03<br>
> 0 Linear |R| = 9.447423e+03<br>
> 1 Linear |R| = 1.013384e-02<br>
> 2 Linear |R| = 4.020995e-08<br>
> 1 Nonlinear |R| = 1.404678e-02<br>
> 0 Linear |R| = 1.404678e-02<br>
> 1 Linear |R| = 5.104757e-08<br>
> 2 Linear |R| = 7.699637e-14<br>
> 2 Nonlinear |R| = 5.106418e-08<br>
><br>
><br>
> run 2:<br>
><br>
> 0 Nonlinear |R| = 9.447423e+03<br>
> 0 Linear |R| = 9.447423e+03<br>
> 1 Linear |R| = 1.013384e-02<br>
> 2 Linear |R| = 4.020995e-08<br>
> 1 Nonlinear |R| = 1.404678e-02<br>
> 0 Linear |R| = 1.404678e-02<br>
> 1 Linear |R| = 5.109913e-08<br>
> 2 Linear |R| = 7.189091e-14<br>
> 2 Nonlinear |R| = 5.111591e-08<br>
><br>
> run 3:<br>
><br>
> 0 Nonlinear |R| = 9.447423e+03<br>
> 0 Linear |R| = 9.447423e+03<br>
> 1 Linear |R| = 1.013384e-02<br>
> 2 Linear |R| = 4.020995e-08<br>
> 1 Nonlinear |R| = 1.404678e-02<br>
> 0 Linear |R| = 1.404678e-02<br>
> 1 Linear |R| = 5.104942e-08<br>
> 2 Linear |R| = 7.465572e-14<br>
> 2 Nonlinear |R| = 5.106642e-08<br>
><br>
> run 4:<br>
><br>
> 0 Nonlinear |R| = 9.447423e+03<br>
> 0 Linear |R| = 9.447423e+03<br>
> 1 Linear |R| = 1.013384e-02<br>
> 2 Linear |R| = 4.020995e-08<br>
> 1 Nonlinear |R| = 1.404678e-02<br>
> 0 Linear |R| = 1.404678e-02<br>
> 1 Linear |R| = 5.102730e-08<br>
> 2 Linear |R| = 7.132220e-14<br>
> 2 Nonlinear |R| = 5.104442e-08<br>
><br>
> Solver details:<br>
><br>
> SNES Object: 8 MPI processes<br>
> type: newtonls<br>
> maximum iterations=15, maximum function evaluations=10000<br>
> tolerances: relative=1e-08, absolute=1e-11, solution=1e-50<br>
> total number of linear solver iterations=4<br>
> total number of function evaluations=7<br>
> norm schedule ALWAYS<br>
> SNESLineSearch Object: 8 MPI processes<br>
> type: basic<br>
> maxstep=1.000000e+08, minlambda=1.000000e-12<br>
> tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08<br>
> maximum iterations=40<br>
> KSP Object: 8 MPI processes<br>
> type: gmres<br>
> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement<br>
> happy breakdown tolerance 1e-30<br>
> maximum iterations=100, initial guess is zero<br>
> tolerances: relative=1e-06, absolute=1e-50, divergence=10000.<br>
> right preconditioning<br>
> using UNPRECONDITIONED norm type for convergence test<br>
> PC Object: 8 MPI processes<br>
> type: lu<br>
> out-of-place factorization<br>
> tolerance for zero pivot 2.22045e-14<br>
> matrix ordering: natural<br>
> factor fill ratio given 0., needed 0.<br>
> Factored matrix follows:<br>
> Mat Object: 8 MPI processes<br>
> type: superlu_dist<br>
> rows=7925, cols=7925<br>
> package used to perform factorization: superlu_dist<br>
> total: nonzeros=0, allocated nonzeros=0<br>
> total number of mallocs used during MatSetValues calls =0<br>
> SuperLU_DIST run parameters:<br>
> Process grid nprow 4 x npcol 2<br>
> Equilibrate matrix TRUE<br>
> Matrix input mode 1<br>
> Replace tiny pivots FALSE<br>
> Use iterative refinement TRUE<br>
> Processors in row 4 col partition 2<br>
> Row permutation LargeDiag<br>
> Column permutation METIS_AT_PLUS_A<br>
> Parallel symbolic factorization FALSE<br>
> Repeated factorization SamePattern<br>
> linear system matrix followed by preconditioner matrix:<br>
> Mat Object: 8 MPI processes<br>
> type: mffd<br>
> rows=7925, cols=7925<br>
> Matrix-free approximation:<br>
> err=1.49012e-08 (relative error in function evaluation)<br>
> Using wp compute h routine<br>
> Does not compute normU<br>
> Mat Object: () 8 MPI processes<br>
> type: mpiaij<br>
> rows=7925, cols=7925<br>
> total: nonzeros=63587, allocated nonzeros=63865<br>
> total number of mallocs used during MatSetValues calls =0<br>
> not using I-node (on process 0) routines<br>
><br>
><br>
> Fande,<br>
><br>
><br>
<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.caam.rice.edu/~mk51/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div>
</div></div>