[petsc-users] SuperLU_dist computes rubbish

Hong hzhang at mcs.anl.gov
Mon Nov 16 09:22:45 CST 2015


Barry:

> $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat
> -rhs ~/Downloads/rhs.dat  -ksp_converged_reason
> Linear solve did not converge due to DIVERGED_NANORINF iterations 0
> Number of iterations =   0
> Residual norm 0.0220971
>

Hmm, I'm working on it, and forgot to check  '-ksp_converged_reason'.
However, superlu_dist does not report zero pivot, might simply 'exit'.
I'll contact Sherry about it.

Hong

>
>   The matrix has a zero pivot with the nd ordering
>
> $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat
> -rhs ~/Downloads/rhs.dat  -ksp_converged_reason -ksp_error_if_not_converged
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Zero pivot in LU factorization:
> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot
> [0]PETSC ERROR: Zero pivot row 5 value 0. tolerance 2.22045e-14
>
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-g5ca2a2b  GIT
> Date: 2015-11-13 02:00:47 -0600
> [0]PETSC ERROR: ./ex10 on a arch-mpich-nemesis named
> Barrys-MacBook-Pro.local by barrysmith Sun Nov 15 21:37:15 2015
> [0]PETSC ERROR: Configure options --download-mpich
> --download-mpich-device=ch3:nemesis
> [0]PETSC ERROR: #1 MatPivotCheck_none() line 688 in
> /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h
> [0]PETSC ERROR: #2 MatPivotCheck() line 707 in
> /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h
> [0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1332 in
> /Users/barrysmith/Src/PETSc/src/mat/impls/aij/seq/inode.c
> [0]PETSC ERROR: #4 MatLUFactorNumeric() line 2946 in
> /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c
> [0]PETSC ERROR: #5 PCSetUp_LU() line 152 in
> /Users/barrysmith/Src/PETSc/src/ksp/pc/impls/factor/lu/lu.c
> [0]PETSC ERROR: #6 PCSetUp() line 984 in
> /Users/barrysmith/Src/PETSc/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #7 KSPSetUp() line 332 in
> /Users/barrysmith/Src/PETSc/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #8 main() line 312 in
> /Users/barrysmith/Src/petsc/src/ksp/ksp/examples/tutorials/ex10.c
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -f0 /Users/barrysmith/Downloads/mat.dat
> [0]PETSC ERROR: -ksp_converged_reason
> [0]PETSC ERROR: -ksp_error_if_not_converged
> [0]PETSC ERROR: -ksp_monitor_true_residual
> [0]PETSC ERROR: -malloc_test
> [0]PETSC ERROR: -pc_type lu
> [0]PETSC ERROR: -rhs /Users/barrysmith/Downloads/rhs.dat
> [0]PETSC ERROR: ----------------End of Error Message -------send entire
> error message to petsc-maint at mcs.anl.gov----------
> application called MPI_Abort(MPI_COMM_WORLD, 71) - process 0
> ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>)
> arch-mpich-nemesis
>
> $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat
> -rhs ~/Downloads/rhs.dat  -ksp_converged_reason -ksp_error_if_not_converged
> -pc_factor_nonzeros_along_diagonal"
> >
> ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>)
> arch-mpich-nemesis
> $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat
> -rhs ~/Downloads/rhs.dat  -ksp_converged_reason -ksp_error_if_not_converged
> -pc_factor_nonzeros_along_diagonal
>   0 KSP preconditioned resid norm 1.905901677970e+00 true resid norm
> 2.209708691208e-02 ||r(i)||/||b|| 1.000000000000e+00
>   1 KSP preconditioned resid norm 1.703926496877e-14 true resid norm
> 5.880234823611e-15 ||r(i)||/||b|| 2.661090507997e-13
> Linear solve converged due to CONVERGED_RTOL iterations 1
> Number of iterations =   1
>   Residual norm < 1.e-12
> ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>)
> arch-mpich-nemesis
>
> Jan,
>
>    Remember you ALWAYS have to call KSPConvergedReason() after a KSPSolve
> to see what happened in the solve.
>
> > On Nov 15, 2015, at 8:53 PM, Hong <hzhang at mcs.anl.gov> wrote:
> >
> > Jan:
> > I can reproduce reported behavior using
> > petsc/src/ksp/ksp/examples/tutorials/ex10.c on your mat.dat and rhs.dat.
> >
> > Using petsc sequential lu with default ordering 'nd', I get
> > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu
> > Number of iterations =   0
> > Residual norm 0.0220971
> >
> > Changing to
> > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_ordering_type
> natural
> > Number of iterations =   1
> >   Residual norm < 1.e-12
> >
> > Back to superlu_dist, I get
> > mpiexec -n 3 ./ex10 -f0 /homes/hzhang/tmp/mat.dat -rhs
> /homes/hzhang/tmp/rhs.dat -pc_type lu -pc_factor_mat_solver_package
> superlu_dist
> > Number of iterations =   4
> > Residual norm 25650.8
> >
> > which uses default ordering  (-ksp_view)
> > Row permutation LargeDiag
> > Column permutation METIS_AT_PLUS_A
> >
> > Run it with
> > mpiexec -n 3 ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu
> -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_rowperm
> NATURAL -mat_superlu_dist_colperm NATURAL
> > Number of iterations =   1
> >   Residual norm < 1.e-12
> >
> > i.e., your problem is sensitive to matrix ordering, which I do not know
> why.
> >
> > I checked condition number of your mat.dat using superlu:
> > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu
> -pc_factor_mat_solver_package superlu -mat_superlu_conditionnumber
> >   Recip. condition number = 1.137938e-03
> > Number of iterations =   1
> >   Residual norm < 1.e-12
> >
> > As you see, matrix is well-conditioned. Why is it so sensitive to matrix
> ordering?
> >
> > Hong
> >
> > Using attached petsc4py code, matrix and right-hand side, SuperLU_dist
> > returns totally wrong solution for mixed Laplacian:
> >
> > $ tar -xzf report.tar.gz
> > $ python test-solve.py -pc_factor_mat_solver_package mumps
> -ksp_final_residual
> > KSP final norm of residual 3.81865e-15
> > $ python test-solve.py -pc_factor_mat_solver_package umfpack
> -ksp_final_residual
> > KSP final norm of residual 3.68546e-14
> > $ python test-solve.py -pc_factor_mat_solver_package superlu_dist
> -ksp_final_residual
> > KSP final norm of residual 1827.72
> >
> > Moreover final residual is random when run using mpirun -np 3. Maybe
> > a memory corruption issue? This is reproducible using PETSc 3.6.2 (and
> > SuperLU_dist configured by PETSc) and much older, see
> > http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html
> > but has never been reported upstream.
> >
> > The code for assembling the matrix and rhs using FEniCS is also
> > included for the sake of completeness.
> >
> > Jan
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151116/8dd8e453/attachment.html>


More information about the petsc-users mailing list