[petsc-users] SuperLU_dist computes rubbish
Barry Smith
bsmith at mcs.anl.gov
Sun Nov 15 21:41:49 CST 2015
$ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason
Linear solve did not converge due to DIVERGED_NANORINF iterations 0
Number of iterations = 0
Residual norm 0.0220971
The matrix has a zero pivot with the nd ordering
$ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Zero pivot in LU factorization: http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot
[0]PETSC ERROR: Zero pivot row 5 value 0. tolerance 2.22045e-14
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-g5ca2a2b GIT Date: 2015-11-13 02:00:47 -0600
[0]PETSC ERROR: ./ex10 on a arch-mpich-nemesis named Barrys-MacBook-Pro.local by barrysmith Sun Nov 15 21:37:15 2015
[0]PETSC ERROR: Configure options --download-mpich --download-mpich-device=ch3:nemesis
[0]PETSC ERROR: #1 MatPivotCheck_none() line 688 in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h
[0]PETSC ERROR: #2 MatPivotCheck() line 707 in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h
[0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1332 in /Users/barrysmith/Src/PETSc/src/mat/impls/aij/seq/inode.c
[0]PETSC ERROR: #4 MatLUFactorNumeric() line 2946 in /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c
[0]PETSC ERROR: #5 PCSetUp_LU() line 152 in /Users/barrysmith/Src/PETSc/src/ksp/pc/impls/factor/lu/lu.c
[0]PETSC ERROR: #6 PCSetUp() line 984 in /Users/barrysmith/Src/PETSc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #7 KSPSetUp() line 332 in /Users/barrysmith/Src/PETSc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #8 main() line 312 in /Users/barrysmith/Src/petsc/src/ksp/ksp/examples/tutorials/ex10.c
[0]PETSC ERROR: PETSc Option Table entries:
[0]PETSC ERROR: -f0 /Users/barrysmith/Downloads/mat.dat
[0]PETSC ERROR: -ksp_converged_reason
[0]PETSC ERROR: -ksp_error_if_not_converged
[0]PETSC ERROR: -ksp_monitor_true_residual
[0]PETSC ERROR: -malloc_test
[0]PETSC ERROR: -pc_type lu
[0]PETSC ERROR: -rhs /Users/barrysmith/Downloads/rhs.dat
[0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov----------
application called MPI_Abort(MPI_COMM_WORLD, 71) - process 0
~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) arch-mpich-nemesis
$ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged -pc_factor_nonzeros_along_diagonal"
>
~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) arch-mpich-nemesis
$ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged -pc_factor_nonzeros_along_diagonal
0 KSP preconditioned resid norm 1.905901677970e+00 true resid norm 2.209708691208e-02 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 1.703926496877e-14 true resid norm 5.880234823611e-15 ||r(i)||/||b|| 2.661090507997e-13
Linear solve converged due to CONVERGED_RTOL iterations 1
Number of iterations = 1
Residual norm < 1.e-12
~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) arch-mpich-nemesis
Jan,
Remember you ALWAYS have to call KSPConvergedReason() after a KSPSolve to see what happened in the solve.
> On Nov 15, 2015, at 8:53 PM, Hong <hzhang at mcs.anl.gov> wrote:
>
> Jan:
> I can reproduce reported behavior using
> petsc/src/ksp/ksp/examples/tutorials/ex10.c on your mat.dat and rhs.dat.
>
> Using petsc sequential lu with default ordering 'nd', I get
> ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu
> Number of iterations = 0
> Residual norm 0.0220971
>
> Changing to
> ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_ordering_type natural
> Number of iterations = 1
> Residual norm < 1.e-12
>
> Back to superlu_dist, I get
> mpiexec -n 3 ./ex10 -f0 /homes/hzhang/tmp/mat.dat -rhs /homes/hzhang/tmp/rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu_dist
> Number of iterations = 4
> Residual norm 25650.8
>
> which uses default ordering (-ksp_view)
> Row permutation LargeDiag
> Column permutation METIS_AT_PLUS_A
>
> Run it with
> mpiexec -n 3 ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_rowperm NATURAL -mat_superlu_dist_colperm NATURAL
> Number of iterations = 1
> Residual norm < 1.e-12
>
> i.e., your problem is sensitive to matrix ordering, which I do not know why.
>
> I checked condition number of your mat.dat using superlu:
> ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu -mat_superlu_conditionnumber
> Recip. condition number = 1.137938e-03
> Number of iterations = 1
> Residual norm < 1.e-12
>
> As you see, matrix is well-conditioned. Why is it so sensitive to matrix ordering?
>
> Hong
>
> Using attached petsc4py code, matrix and right-hand side, SuperLU_dist
> returns totally wrong solution for mixed Laplacian:
>
> $ tar -xzf report.tar.gz
> $ python test-solve.py -pc_factor_mat_solver_package mumps -ksp_final_residual
> KSP final norm of residual 3.81865e-15
> $ python test-solve.py -pc_factor_mat_solver_package umfpack -ksp_final_residual
> KSP final norm of residual 3.68546e-14
> $ python test-solve.py -pc_factor_mat_solver_package superlu_dist -ksp_final_residual
> KSP final norm of residual 1827.72
>
> Moreover final residual is random when run using mpirun -np 3. Maybe
> a memory corruption issue? This is reproducible using PETSc 3.6.2 (and
> SuperLU_dist configured by PETSc) and much older, see
> http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html
> but has never been reported upstream.
>
> The code for assembling the matrix and rhs using FEniCS is also
> included for the sake of completeness.
>
> Jan
>
More information about the petsc-users
mailing list