[petsc-users] superlu_dist produces random results

Wed Nov 15 15:52:18 CST 2017

> On Nov 15, 2017, at 3:36 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> 
> Hi Barry,
> 
> Thanks for your reply. I was wondering why this happens only when we use superlu_dist. I am trying to understand the algorithm in superlu_dist. If we use ASM or MUMPS, we do not produce these differences. 
> 
> The differences actually are NOT meaningless.  In fact, we have a real transient application that presents this issue.   When we run the simulation with superlu_dist in parallel for thousands of time steps, the final physics  solution looks totally different from different runs. The differences are not acceptable any more.  For a steady problem, the difference may be meaningless. But it is significant for the transient problem. 

  I submit that the "physics solution" of all of these runs is equally right and equally wrong. If the solutions are very different due to a small perturbation than something is wrong with the model or the integrator, I don't think you can blame the linear solver (see below)
> 
> This makes the solution not reproducible, and we can not even set a targeting solution in the test system because the solution is so different from one run to another.   I guess there might/may be a tiny bug in superlu_dist or the PETSc interface to superlu_dist.

  This is possible but it is also possible this is due to normal round off inside of SuperLU dist.

   Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't really matter exactly how well SuperLU_Dist does. The nonlinear iteration does essential defect correction for you; are you making sure that the nonlinear iteration always works for every timestep? For example confirm that SNESGetConvergedReason() is always positive.

> 
> 
> Fande,
> 
> 
> 
> 
> On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   Meaningless differences
> 
> 
> > On Nov 15, 2017, at 2:26 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> >
> > Hi,
> >
> > There is a heat conduction problem. When superlu_dist is used as a preconditioner, we have random results from different runs. Is there a random algorithm in superlu_dist? If we use ASM or MUMPS as the preconditioner, we then don't have this issue.
> >
> > run 1:
> >
> >  0 Nonlinear |R| = 9.447423e+03
> >       0 Linear |R| = 9.447423e+03
> >       1 Linear |R| = 1.013384e-02
> >       2 Linear |R| = 4.020995e-08
> >  1 Nonlinear |R| = 1.404678e-02
> >       0 Linear |R| = 1.404678e-02
> >       1 Linear |R| = 5.104757e-08
> >       2 Linear |R| = 7.699637e-14
> >  2 Nonlinear |R| = 5.106418e-08
> >
> >
> > run 2:
> >
> >  0 Nonlinear |R| = 9.447423e+03
> >       0 Linear |R| = 9.447423e+03
> >       1 Linear |R| = 1.013384e-02
> >       2 Linear |R| = 4.020995e-08
> >  1 Nonlinear |R| = 1.404678e-02
> >       0 Linear |R| = 1.404678e-02
> >       1 Linear |R| = 5.109913e-08
> >       2 Linear |R| = 7.189091e-14
> >  2 Nonlinear |R| = 5.111591e-08
> >
> > run 3:
> >
> >  0 Nonlinear |R| = 9.447423e+03
> >       0 Linear |R| = 9.447423e+03
> >       1 Linear |R| = 1.013384e-02
> >       2 Linear |R| = 4.020995e-08
> >  1 Nonlinear |R| = 1.404678e-02
> >       0 Linear |R| = 1.404678e-02
> >       1 Linear |R| = 5.104942e-08
> >       2 Linear |R| = 7.465572e-14
> >  2 Nonlinear |R| = 5.106642e-08
> >
> > run 4:
> >
> >  0 Nonlinear |R| = 9.447423e+03
> >       0 Linear |R| = 9.447423e+03
> >       1 Linear |R| = 1.013384e-02
> >       2 Linear |R| = 4.020995e-08
> >  1 Nonlinear |R| = 1.404678e-02
> >       0 Linear |R| = 1.404678e-02
> >       1 Linear |R| = 5.102730e-08
> >       2 Linear |R| = 7.132220e-14
> >  2 Nonlinear |R| = 5.104442e-08
> >
> > Solver details:
> >
> > SNES Object: 8 MPI processes
> >   type: newtonls
> >   maximum iterations=15, maximum function evaluations=10000
> >   tolerances: relative=1e-08, absolute=1e-11, solution=1e-50
> >   total number of linear solver iterations=4
> >   total number of function evaluations=7
> >   norm schedule ALWAYS
> >   SNESLineSearch Object: 8 MPI processes
> >     type: basic
> >     maxstep=1.000000e+08, minlambda=1.000000e-12
> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
> >     maximum iterations=40
> >   KSP Object: 8 MPI processes
> >     type: gmres
> >       restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> >       happy breakdown tolerance 1e-30
> >     maximum iterations=100, initial guess is zero
> >     tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> >     right preconditioning
> >     using UNPRECONDITIONED norm type for convergence test
> >   PC Object: 8 MPI processes
> >     type: lu
> >       out-of-place factorization
> >       tolerance for zero pivot 2.22045e-14
> >       matrix ordering: natural
> >       factor fill ratio given 0., needed 0.
> >         Factored matrix follows:
> >           Mat Object: 8 MPI processes
> >             type: superlu_dist
> >             rows=7925, cols=7925
> >             package used to perform factorization: superlu_dist
> >             total: nonzeros=0, allocated nonzeros=0
> >             total number of mallocs used during MatSetValues calls =0
> >               SuperLU_DIST run parameters:
> >                 Process grid nprow 4 x npcol 2
> >                 Equilibrate matrix TRUE
> >                 Matrix input mode 1
> >                 Replace tiny pivots FALSE
> >                 Use iterative refinement TRUE
> >                 Processors in row 4 col partition 2
> >                 Row permutation LargeDiag
> >                 Column permutation METIS_AT_PLUS_A
> >                 Parallel symbolic factorization FALSE
> >                 Repeated factorization SamePattern
> >     linear system matrix followed by preconditioner matrix:
> >     Mat Object: 8 MPI processes
> >       type: mffd
> >       rows=7925, cols=7925
> >         Matrix-free approximation:
> >           err=1.49012e-08 (relative error in function evaluation)
> >           Using wp compute h routine
> >               Does not compute normU
> >     Mat Object: () 8 MPI processes
> >       type: mpiaij
> >       rows=7925, cols=7925
> >       total: nonzeros=63587, allocated nonzeros=63865
> >       total number of mallocs used during MatSetValues calls =0
> >         not using I-node (on process 0) routines
> >
> >
> > Fande,
> >
> >
> 
>