<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">Barry:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">$ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason<br>
Linear solve did not converge due to DIVERGED_NANORINF iterations 0<br>
<span class="">Number of iterations = 0<br>
Residual norm 0.0220971<br></span></blockquote><div> </div><div>Hmm, I'm working on it, and forgot to check '-ksp_converged_reason'.</div><div>However, superlu_dist does not report zero pivot, might simply 'exit'.</div><div>I'll contact Sherry about it.</div><div><br></div><div>Hong</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">
<br>
</span> The matrix has a zero pivot with the nd ordering<br>
<br>
$ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged<br>
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>
[0]PETSC ERROR: Zero pivot in LU factorization: <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot</a><br>
[0]PETSC ERROR: Zero pivot row 5 value 0. tolerance 2.22045e-14<br>
<br>
[0]PETSC ERROR: See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<br>
[0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-g5ca2a2b GIT Date: 2015-11-13 02:00:47 -0600<br>
[0]PETSC ERROR: ./ex10 on a arch-mpich-nemesis named Barrys-MacBook-Pro.local by barrysmith Sun Nov 15 21:37:15 2015<br>
[0]PETSC ERROR: Configure options --download-mpich --download-mpich-device=ch3:nemesis<br>
[0]PETSC ERROR: #1 MatPivotCheck_none() line 688 in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h<br>
[0]PETSC ERROR: #2 MatPivotCheck() line 707 in /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h<br>
[0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1332 in /Users/barrysmith/Src/PETSc/src/mat/impls/aij/seq/inode.c<br>
[0]PETSC ERROR: #4 MatLUFactorNumeric() line 2946 in /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c<br>
[0]PETSC ERROR: #5 PCSetUp_LU() line 152 in /Users/barrysmith/Src/PETSc/src/ksp/pc/impls/factor/lu/lu.c<br>
[0]PETSC ERROR: #6 PCSetUp() line 984 in /Users/barrysmith/Src/PETSc/src/ksp/pc/interface/precon.c<br>
[0]PETSC ERROR: #7 KSPSetUp() line 332 in /Users/barrysmith/Src/PETSc/src/ksp/ksp/interface/itfunc.c<br>
[0]PETSC ERROR: #8 main() line 312 in /Users/barrysmith/Src/petsc/src/ksp/ksp/examples/tutorials/ex10.c<br>
[0]PETSC ERROR: PETSc Option Table entries:<br>
[0]PETSC ERROR: -f0 /Users/barrysmith/Downloads/mat.dat<br>
[0]PETSC ERROR: -ksp_converged_reason<br>
[0]PETSC ERROR: -ksp_error_if_not_converged<br>
[0]PETSC ERROR: -ksp_monitor_true_residual<br>
[0]PETSC ERROR: -malloc_test<br>
[0]PETSC ERROR: -pc_type lu<br>
[0]PETSC ERROR: -rhs /Users/barrysmith/Downloads/rhs.dat<br>
[0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint@mcs.anl.gov----------<br>
application called MPI_Abort(MPI_COMM_WORLD, 71) - process 0<br>
~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) arch-mpich-nemesis<br>
<br>
$ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged -pc_factor_nonzeros_along_diagonal"<br>
><br>
~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) arch-mpich-nemesis<br>
$ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged -pc_factor_nonzeros_along_diagonal<br>
0 KSP preconditioned resid norm 1.905901677970e+00 true resid norm 2.209708691208e-02 ||r(i)||/||b|| 1.000000000000e+00<br>
1 KSP preconditioned resid norm 1.703926496877e-14 true resid norm 5.880234823611e-15 ||r(i)||/||b|| 2.661090507997e-13<br>
Linear solve converged due to CONVERGED_RTOL iterations 1<br>
<span class="">Number of iterations = 1<br>
Residual norm < 1.e-12<br>
</span>~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) arch-mpich-nemesis<br>
<br>
Jan,<br>
<br>
Remember you ALWAYS have to call KSPConvergedReason() after a KSPSolve to see what happened in the solve.<br>
<div class=""><div class="h5"><br>
> On Nov 15, 2015, at 8:53 PM, Hong <<a href="mailto:hzhang@mcs.anl.gov">hzhang@mcs.anl.gov</a>> wrote:<br>
><br>
> Jan:<br>
> I can reproduce reported behavior using<br>
> petsc/src/ksp/ksp/examples/tutorials/ex10.c on your mat.dat and rhs.dat.<br>
><br>
> Using petsc sequential lu with default ordering 'nd', I get<br>
> ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu<br>
> Number of iterations = 0<br>
> Residual norm 0.0220971<br>
><br>
> Changing to<br>
> ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_ordering_type natural<br>
> Number of iterations = 1<br>
> Residual norm < 1.e-12<br>
><br>
> Back to superlu_dist, I get<br>
> mpiexec -n 3 ./ex10 -f0 /homes/hzhang/tmp/mat.dat -rhs /homes/hzhang/tmp/rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu_dist<br>
> Number of iterations = 4<br>
> Residual norm 25650.8<br>
><br>
> which uses default ordering (-ksp_view)<br>
> Row permutation LargeDiag<br>
> Column permutation METIS_AT_PLUS_A<br>
><br>
> Run it with<br>
> mpiexec -n 3 ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_rowperm NATURAL -mat_superlu_dist_colperm NATURAL<br>
> Number of iterations = 1<br>
> Residual norm < 1.e-12<br>
><br>
> i.e., your problem is sensitive to matrix ordering, which I do not know why.<br>
><br>
> I checked condition number of your mat.dat using superlu:<br>
> ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_solver_package superlu -mat_superlu_conditionnumber<br>
> Recip. condition number = 1.137938e-03<br>
> Number of iterations = 1<br>
> Residual norm < 1.e-12<br>
><br>
> As you see, matrix is well-conditioned. Why is it so sensitive to matrix ordering?<br>
><br>
> Hong<br>
><br>
> Using attached petsc4py code, matrix and right-hand side, SuperLU_dist<br>
> returns totally wrong solution for mixed Laplacian:<br>
><br>
> $ tar -xzf report.tar.gz<br>
> $ python test-solve.py -pc_factor_mat_solver_package mumps -ksp_final_residual<br>
> KSP final norm of residual 3.81865e-15<br>
> $ python test-solve.py -pc_factor_mat_solver_package umfpack -ksp_final_residual<br>
> KSP final norm of residual 3.68546e-14<br>
> $ python test-solve.py -pc_factor_mat_solver_package superlu_dist -ksp_final_residual<br>
> KSP final norm of residual 1827.72<br>
><br>
> Moreover final residual is random when run using mpirun -np 3. Maybe<br>
> a memory corruption issue? This is reproducible using PETSc 3.6.2 (and<br>
> SuperLU_dist configured by PETSc) and much older, see<br>
> <a href="http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html" rel="noreferrer" target="_blank">http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html</a><br>
> but has never been reported upstream.<br>
><br>
> The code for assembling the matrix and rhs using FEniCS is also<br>
> included for the sake of completeness.<br>
><br>
> Jan<br>
><br>
<br>
</div></div></blockquote></div><br></div></div>