[petsc-users] SuperLU_dist computes rubbish
Jan Blechta
blechta at karlin.mff.cuni.cz
Mon Nov 16 08:04:28 CST 2015
On Sun, 15 Nov 2015 21:41:49 -0600
Barry Smith <bsmith at mcs.anl.gov> wrote:
> $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0
> ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason
> Linear solve did not converge due to DIVERGED_NANORINF iterations 0
> Number of iterations = 0 Residual norm 0.0220971
>
>
> The matrix has a zero pivot with the nd ordering
>
> $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0
> ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason
> -ksp_error_if_not_converged [0]PETSC ERROR: ---------------------
> Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Zero pivot in LU factorization:
> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot
> [0]PETSC ERROR: Zero pivot row 5 value 0. tolerance 2.22045e-14
>
> [0]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
> shooting. [0]PETSC ERROR: Petsc Development GIT revision:
> v3.6.2-1539-g5ca2a2b GIT Date: 2015-11-13 02:00:47 -0600 [0]PETSC
> ERROR: ./ex10 on a arch-mpich-nemesis named Barrys-MacBook-Pro.local
> by barrysmith Sun Nov 15 21:37:15 2015 [0]PETSC ERROR: Configure
> options --download-mpich --download-mpich-device=ch3:nemesis [0]PETSC
> ERROR: #1 MatPivotCheck_none() line 688
> in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h
> [0]PETSC ERROR: #2 MatPivotCheck() line 707
> in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h
> [0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1332
> in /Users/barrysmith/Src/PETSc/src/mat/impls/aij/seq/inode.c [0]PETSC
> ERROR: #4 MatLUFactorNumeric() line 2946
> in /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c [0]PETSC
> ERROR: #5 PCSetUp_LU() line 152
> in /Users/barrysmith/Src/PETSc/src/ksp/pc/impls/factor/lu/lu.c
> [0]PETSC ERROR: #6 PCSetUp() line 984
> in /Users/barrysmith/Src/PETSc/src/ksp/pc/interface/precon.c [0]PETSC
> ERROR: #7 KSPSetUp() line 332
> in /Users/barrysmith/Src/PETSc/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #8 main() line 312
> in /Users/barrysmith/Src/petsc/src/ksp/ksp/examples/tutorials/ex10.c
> [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR:
> -f0 /Users/barrysmith/Downloads/mat.dat [0]PETSC ERROR:
> -ksp_converged_reason [0]PETSC ERROR: -ksp_error_if_not_converged
> [0]PETSC ERROR: -ksp_monitor_true_residual [0]PETSC ERROR:
> -malloc_test [0]PETSC ERROR: -pc_type lu [0]PETSC ERROR:
> -rhs /Users/barrysmith/Downloads/rhs.dat [0]PETSC ERROR:
> ----------------End of Error Message -------send entire error message
> to petsc-maint at mcs.anl.gov---------- application called
> MPI_Abort(MPI_COMM_WORLD, 71) - process 0
> ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>)
> arch-mpich-nemesis
>
> $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0
> ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason
> -ksp_error_if_not_converged -pc_factor_nonzeros_along_diagonal"
> >
> ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>)
> arch-mpich-nemesis $ ./ex10 -pc_type lu -ksp_monitor_true_residual
> -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat
> -ksp_converged_reason -ksp_error_if_not_converged
> -pc_factor_nonzeros_along_diagonal 0 KSP preconditioned resid norm
> 1.905901677970e+00 true resid norm 2.209708691208e-02 ||r(i)||/||b||
> 1.000000000000e+00 1 KSP preconditioned resid norm 1.703926496877e-14
> true resid norm 5.880234823611e-15 ||r(i)||/||b|| 2.661090507997e-13
> Linear solve converged due to CONVERGED_RTOL iterations 1 Number of
> iterations = 1 Residual norm < 1.e-12
> ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>)
> arch-mpich-nemesis
>
> Jan,
>
> Remember you ALWAYS have to call KSPConvergedReason() after a
> KSPSolve to see what happened in the solve.
Yes, it was a stupidity on my side. Thanks, Barry and Hong.
Jan
>
> > On Nov 15, 2015, at 8:53 PM, Hong <hzhang at mcs.anl.gov> wrote:
> >
> > Jan:
> > I can reproduce reported behavior using
> > petsc/src/ksp/ksp/examples/tutorials/ex10.c on your mat.dat and
> > rhs.dat.
> >
> > Using petsc sequential lu with default ordering 'nd', I get
> > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu
> > Number of iterations = 0
> > Residual norm 0.0220971
> >
> > Changing to
> > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu
> > -pc_factor_mat_ordering_type natural Number of iterations = 1
> > Residual norm < 1.e-12
> >
> > Back to superlu_dist, I get
> > mpiexec -n 3 ./ex10 -f0 /homes/hzhang/tmp/mat.dat
> > -rhs /homes/hzhang/tmp/rhs.dat -pc_type lu
> > -pc_factor_mat_solver_package superlu_dist Number of iterations =
> > 4 Residual norm 25650.8
> >
> > which uses default ordering (-ksp_view)
> > Row permutation LargeDiag
> > Column permutation METIS_AT_PLUS_A
> >
> > Run it with
> > mpiexec -n 3 ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu
> > -pc_factor_mat_solver_package superlu_dist
> > -mat_superlu_dist_rowperm NATURAL -mat_superlu_dist_colperm NATURAL
> > Number of iterations = 1 Residual norm < 1.e-12
> >
> > i.e., your problem is sensitive to matrix ordering, which I do not
> > know why.
> >
> > I checked condition number of your mat.dat using superlu:
> > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu
> > -pc_factor_mat_solver_package superlu -mat_superlu_conditionnumber
> > Recip. condition number = 1.137938e-03 Number of iterations = 1
> > Residual norm < 1.e-12
> >
> > As you see, matrix is well-conditioned. Why is it so sensitive to
> > matrix ordering?
> >
> > Hong
> >
> > Using attached petsc4py code, matrix and right-hand side,
> > SuperLU_dist returns totally wrong solution for mixed Laplacian:
> >
> > $ tar -xzf report.tar.gz
> > $ python test-solve.py -pc_factor_mat_solver_package mumps
> > -ksp_final_residual KSP final norm of residual 3.81865e-15
> > $ python test-solve.py -pc_factor_mat_solver_package umfpack
> > -ksp_final_residual KSP final norm of residual 3.68546e-14
> > $ python test-solve.py -pc_factor_mat_solver_package superlu_dist
> > -ksp_final_residual KSP final norm of residual 1827.72
> >
> > Moreover final residual is random when run using mpirun -np 3. Maybe
> > a memory corruption issue? This is reproducible using PETSc 3.6.2
> > (and SuperLU_dist configured by PETSc) and much older, see
> > http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html
> > but has never been reported upstream.
> >
> > The code for assembling the matrix and rhs using FEniCS is also
> > included for the sake of completeness.
> >
> > Jan
> >
More information about the petsc-users
mailing list