[petsc-users] superlu_dist produces random results

Wed Nov 15 16:18:20 CST 2017

To be clear: these differences completely go away with MUMPS?

Can you valgrind this? We have seen some valgrind warning from MUMPS from
BLAS routines. It could be that your BLAS is buggy (and SuperLU uses some
BLAS routines that MUMPS does not). I think SuperLU does more/different
pivoting than MUMPS.  What BLAS are you using? (download, MKL, ...)

On Wed, Nov 15, 2017 at 4:52 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>
> > On Nov 15, 2017, at 3:36 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> >
> > Hi Barry,
> >
> > Thanks for your reply. I was wondering why this happens only when we use
> superlu_dist. I am trying to understand the algorithm in superlu_dist. If
> we use ASM or MUMPS, we do not produce these differences.
> >
> > The differences actually are NOT meaningless.  In fact, we have a real
> transient application that presents this issue.   When we run the
> simulation with superlu_dist in parallel for thousands of time steps, the
> final physics  solution looks totally different from different runs. The
> differences are not acceptable any more.  For a steady problem, the
> difference may be meaningless. But it is significant for the transient
> problem.
>
>   I submit that the "physics solution" of all of these runs is equally
> right and equally wrong. If the solutions are very different due to a small
> perturbation than something is wrong with the model or the integrator, I
> don't think you can blame the linear solver (see below)
> >
> > This makes the solution not reproducible, and we can not even set a
> targeting solution in the test system because the solution is so different
> from one run to another.   I guess there might/may be a tiny bug in
> superlu_dist or the PETSc interface to superlu_dist.
>
>   This is possible but it is also possible this is due to normal round off
> inside of SuperLU dist.
>
>    Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't
> really matter exactly how well SuperLU_Dist does. The nonlinear iteration
> does essential defect correction for you; are you making sure that the
> nonlinear iteration always works for every timestep? For example confirm
> that SNESGetConvergedReason() is always positive.
>
>
> >
> >
> > Fande,
> >
> >
> >
> >
> > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >   Meaningless differences
> >
> >
> > > On Nov 15, 2017, at 2:26 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> > >
> > > Hi,
> > >
> > > There is a heat conduction problem. When superlu_dist is used as a
> preconditioner, we have random results from different runs. Is there a
> random algorithm in superlu_dist? If we use ASM or MUMPS as the
> preconditioner, we then don't have this issue.
> > >
> > > run 1:
> > >
> > >  0 Nonlinear |R| = 9.447423e+03
> > >       0 Linear |R| = 9.447423e+03
> > >       1 Linear |R| = 1.013384e-02
> > >       2 Linear |R| = 4.020995e-08
> > >  1 Nonlinear |R| = 1.404678e-02
> > >       0 Linear |R| = 1.404678e-02
> > >       1 Linear |R| = 5.104757e-08
> > >       2 Linear |R| = 7.699637e-14
> > >  2 Nonlinear |R| = 5.106418e-08
> > >
> > >
> > > run 2:
> > >
> > >  0 Nonlinear |R| = 9.447423e+03
> > >       0 Linear |R| = 9.447423e+03
> > >       1 Linear |R| = 1.013384e-02
> > >       2 Linear |R| = 4.020995e-08
> > >  1 Nonlinear |R| = 1.404678e-02
> > >       0 Linear |R| = 1.404678e-02
> > >       1 Linear |R| = 5.109913e-08
> > >       2 Linear |R| = 7.189091e-14
> > >  2 Nonlinear |R| = 5.111591e-08
> > >
> > > run 3:
> > >
> > >  0 Nonlinear |R| = 9.447423e+03
> > >       0 Linear |R| = 9.447423e+03
> > >       1 Linear |R| = 1.013384e-02
> > >       2 Linear |R| = 4.020995e-08
> > >  1 Nonlinear |R| = 1.404678e-02
> > >       0 Linear |R| = 1.404678e-02
> > >       1 Linear |R| = 5.104942e-08
> > >       2 Linear |R| = 7.465572e-14
> > >  2 Nonlinear |R| = 5.106642e-08
> > >
> > > run 4:
> > >
> > >  0 Nonlinear |R| = 9.447423e+03
> > >       0 Linear |R| = 9.447423e+03
> > >       1 Linear |R| = 1.013384e-02
> > >       2 Linear |R| = 4.020995e-08
> > >  1 Nonlinear |R| = 1.404678e-02
> > >       0 Linear |R| = 1.404678e-02
> > >       1 Linear |R| = 5.102730e-08
> > >       2 Linear |R| = 7.132220e-14
> > >  2 Nonlinear |R| = 5.104442e-08
> > >
> > > Solver details:
> > >
> > > SNES Object: 8 MPI processes
> > >   type: newtonls
> > >   maximum iterations=15, maximum function evaluations=10000
> > >   tolerances: relative=1e-08, absolute=1e-11, solution=1e-50
> > >   total number of linear solver iterations=4
> > >   total number of function evaluations=7
> > >   norm schedule ALWAYS
> > >   SNESLineSearch Object: 8 MPI processes
> > >     type: basic
> > >     maxstep=1.000000e+08, minlambda=1.000000e-12
> > >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
> > >     maximum iterations=40
> > >   KSP Object: 8 MPI processes
> > >     type: gmres
> > >       restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > >       happy breakdown tolerance 1e-30
> > >     maximum iterations=100, initial guess is zero
> > >     tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> > >     right preconditioning
> > >     using UNPRECONDITIONED norm type for convergence test
> > >   PC Object: 8 MPI processes
> > >     type: lu
> > >       out-of-place factorization
> > >       tolerance for zero pivot 2.22045e-14
> > >       matrix ordering: natural
> > >       factor fill ratio given 0., needed 0.
> > >         Factored matrix follows:
> > >           Mat Object: 8 MPI processes
> > >             type: superlu_dist
> > >             rows=7925, cols=7925
> > >             package used to perform factorization: superlu_dist
> > >             total: nonzeros=0, allocated nonzeros=0
> > >             total number of mallocs used during MatSetValues calls =0
> > >               SuperLU_DIST run parameters:
> > >                 Process grid nprow 4 x npcol 2
> > >                 Equilibrate matrix TRUE
> > >                 Matrix input mode 1
> > >                 Replace tiny pivots FALSE
> > >                 Use iterative refinement TRUE
> > >                 Processors in row 4 col partition 2
> > >                 Row permutation LargeDiag
> > >                 Column permutation METIS_AT_PLUS_A
> > >                 Parallel symbolic factorization FALSE
> > >                 Repeated factorization SamePattern
> > >     linear system matrix followed by preconditioner matrix:
> > >     Mat Object: 8 MPI processes
> > >       type: mffd
> > >       rows=7925, cols=7925
> > >         Matrix-free approximation:
> > >           err=1.49012e-08 (relative error in function evaluation)
> > >           Using wp compute h routine
> > >               Does not compute normU
> > >     Mat Object: () 8 MPI processes
> > >       type: mpiaij
> > >       rows=7925, cols=7925
> > >       total: nonzeros=63587, allocated nonzeros=63865
> > >       total number of mallocs used during MatSetValues calls =0
> > >         not using I-node (on process 0) routines
> > >
> > >
> > > Fande,
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171115/a6800461/attachment.html>