[petsc-users] SuperLU convergence problem (More test)

Mon Dec 7 14:14:26 CST 2015

Thank. The inserted options works now. I didn't put 
PetscOptionsInsertString in the right place before.

Danyang

On 15-12-07 12:01 PM, Hong wrote:
> Danyang:
> Add 'call MatSetFromOptions(A,ierr)' to your code.
> Attached below is ex52f.F modified from your ex52f.F to be compatible 
> with petsc-dev.
>
> Hong
>
>     Hello Hong,
>
>     Thanks for the quick reply and the option "-mat_superlu_dist_fact
>     SamePattern" works like a charm, if I use this option from the
>     command line.
>
>     How can I add this option as the default. I tried using
>     PetscOptionsInsertString("-mat_superlu_dist_fact
>     SamePattern",ierr) in my code but this does not work.
>
>     Thanks,
>
>     Danyang
>
>
>     On 15-12-07 10:42 AM, Hong wrote:
>>     Danyang :
>>
>>     Adding '-mat_superlu_dist_fact SamePattern' fixed the problem.
>>     Below is how I figured it out.
>>
>>     1. Reading ex52f.F, I see '-superlu_default' =
>>     '-pc_factor_mat_solver_package superlu_dist', the later enables
>>     runtime options for other packages. I use superlu_dist-4.2 and
>>     superlu-4.1 for the tests below.
>>
>>     2. Use the Matrix 168 to setup KSP solver and factorization, all
>>     packages, petsc, superlu_dist and mumps give same correct results:
>>
>>     ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>>     matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>>     -loop_folder matrix_and_rhs_bin -pc_type lu
>>     -pc_factor_mat_solver_package petsc
>>      -->loac matrix a
>>      -->load rhs b
>>      size l,m,n,mm   90000       90000       90000 90000
>>     Norm of error  7.7308E-11 iterations     1
>>      -->Test for matrix          168
>>     ..
>>      -->Test for matrix          172
>>     Norm of error  3.8461E-11 iterations     1
>>
>>     ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>>     matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>>     -loop_folder matrix_and_rhs_bin -pc_type lu
>>     -pc_factor_mat_solver_package superlu_dist
>>     Norm of error  9.4073E-11 iterations     1
>>      -->Test for matrix          168
>>     ...
>>      -->Test for matrix          172
>>     Norm of error  3.8187E-11 iterations     1
>>
>>     3. Use superlu, I get
>>     ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>>     matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>>     -loop_folder matrix_and_rhs_bin -pc_type lu
>>     -pc_factor_mat_solver_package superlu
>>     Norm of error  1.0191E-06 iterations     1
>>      -->Test for matrix          168
>>     ...
>>      -->Test for matrix          172
>>     Norm of error  9.7858E-07 iterations     1
>>
>>     Replacing default DiagPivotThresh: 1. to 0.0, I get same
>>     solutions as other packages:
>>
>>     ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>>     matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>>     -loop_folder matrix_and_rhs_bin -pc_type lu
>>     -pc_factor_mat_solver_package superlu
>>     -mat_superlu_diagpivotthresh 0.0
>>
>>     Norm of error  8.3614E-11 iterations     1
>>      -->Test for matrix          168
>>     ...
>>      -->Test for matrix          172
>>     Norm of error  3.7098E-11 iterations     1
>>
>>     4.
>>     using '-mat_view ascii::ascii_info', I found that
>>     a_flow_check_1.bin and a_flow_check_168.bin seem have same structure:
>>
>>      -->loac matrix a
>>     Mat Object: 1 MPI processes
>>       type: seqaij
>>       rows=90000, cols=90000
>>       total: nonzeros=895600, allocated nonzeros=895600
>>       total number of mallocs used during MatSetValues calls =0
>>         using I-node routines: found 45000 nodes, limit used is 5
>>
>>     5.
>>     Using a_flow_check_1.bin, I am able to reproduce the error you
>>     reported: all packages give correct results except superlu_dist:
>>     ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
>>     matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>>     -loop_folder matrix_and_rhs_bin -pc_type lu
>>     -pc_factor_mat_solver_package superlu_dist
>>     Norm of error  2.5970E-12 iterations     1
>>      -->Test for matrix          168
>>     Norm of error  1.3936E-01 iterations    34
>>      -->Test for matrix          169
>>
>>     I guess the error might come from reuse of matrix factor.
>>     Replacing default
>>     -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
>>     -mat_superlu_dist_fact SamePattern, I get
>>
>>     ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
>>     matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>>     -loop_folder matrix_and_rhs_bin -pc_type lu
>>     -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_fact
>>     SamePattern
>>
>>     Norm of error  2.5970E-12 iterations     1
>>      -->Test for matrix          168
>>     Norm of error  9.4073E-11 iterations     1
>>      -->Test for matrix          169
>>     Norm of error  6.4303E-11 iterations     1
>>      -->Test for matrix          170
>>     Norm of error  7.4327E-11 iterations     1
>>      -->Test for matrix          171
>>     Norm of error  5.4162E-11 iterations     1
>>      -->Test for matrix          172
>>     Norm of error  3.4440E-11 iterations     1
>>      --> End of test, bye
>>
>>     Sherry may tell you why SamePattern_SameRowPerm cause the
>>     difference here.
>>     Best on the above experiments, I would set following as default
>>     '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
>>     '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface.
>>
>>     Hong
>>
>>         Hi Hong,
>>
>>         I did more test today and finally found that the solution
>>         accuracy depends on the initial (first) matrix quality. I
>>         modified the ex52f.F to do the test. There are 6 matrices and
>>         right-hand-side vectors. All these matrices and rhs are from
>>         my reactive transport simulation. Results will be quite
>>         different depending on which one you use to do factorization.
>>         Results will also be different if you run with different
>>         options. My code is similar to the First or the Second test
>>         below. When the matrix is well conditioned, it works fine.
>>         But if the initial matrix is well conditioned, it likely to
>>         crash when the matrix become ill-conditioned. Since most of
>>         my case are well conditioned so I didn't detect the problem
>>         before. This case is a special one.
>>
>>
>>         How can I avoid this problem? Shall I redo factorization? Can
>>         PETSc automatically detect this prolbem or is there any
>>         option available to do this?
>>
>>         All the data and test code (modified ex52f) can be found via
>>         the dropbox link below.
>>         _
>>         __https://www.dropbox.com/s/4al1a60creogd8m/petsc-superlu-test.tar.gz?dl=0_
>>
>>
>>         Summary of my test is shown below.
>>
>>         First, use the Matrix 1 to setup KSP solver and
>>         factorization, then solve 168 to 172
>>
>>         mpiexec.hydra -n 1 ./ex52f -f0
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
>>         -rhs
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
>>         -loop_matrices flow_check -loop_folder
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type
>>         lu -pc_factor_mat_solver_package superlu_dist
>>
>>         Norm of error  3.8815E-11 iterations 1
>>          -->Test for matrix          168
>>         Norm of error  4.2307E-01 iterations 32
>>          -->Test for matrix          169
>>         Norm of error  3.0528E-01 iterations 32
>>          -->Test for matrix          170
>>         Norm of error  3.1177E-01 iterations 32
>>          -->Test for matrix          171
>>         Norm of error  3.2793E-01 iterations 32
>>          -->Test for matrix          172
>>         Norm of error  3.1251E-01 iterations 31
>>
>>         Second, use the Matrix 1 to setup KSP solver and
>>         factorization using the implemented SuperLU relative codes. I
>>         thought this will generate the same results as the First
>>         test, but it actually not.
>>
>>         mpiexec.hydra -n 1 ./ex52f -f0
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
>>         -rhs
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
>>         -loop_matrices flow_check -loop_folder
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin
>>         -superlu_default
>>
>>         Norm of error  2.2632E-12 iterations 1
>>          -->Test for matrix          168
>>         Norm of error  1.0817E+04 iterations 1
>>          -->Test for matrix          169
>>         Norm of error  1.0786E+04 iterations 1
>>          -->Test for matrix          170
>>         Norm of error  1.0792E+04 iterations 1
>>          -->Test for matrix          171
>>         Norm of error  1.0792E+04 iterations 1
>>          -->Test for matrix          172
>>         Norm of error  1.0792E+04 iterations 1
>>
>>
>>         Third, use the Matrix 168 to setup KSP solver and
>>         factorization, then solve 168 to 172
>>
>>         mpiexec.hydra -n 1 ./ex52f -f0
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
>>         -rhs
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
>>         -loop_matrices flow_check -loop_folder
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type
>>         lu -pc_factor_mat_solver_package superlu_dist
>>
>>         Norm of error  9.5528E-10 iterations 1
>>          -->Test for matrix          168
>>         Norm of error  9.4945E-10 iterations 1
>>          -->Test for matrix          169
>>         Norm of error  6.4279E-10 iterations 1
>>          -->Test for matrix          170
>>         Norm of error  7.4633E-10 iterations 1
>>          -->Test for matrix          171
>>         Norm of error  7.4863E-10 iterations 1
>>          -->Test for matrix          172
>>         Norm of error  8.9701E-10 iterations 1
>>
>>         Fourth, use the Matrix 168 to setup KSP solver and
>>         factorization using the implemented SuperLU relative codes. I
>>         thought this will generate the same results as the Third
>>         test, but it actually not.
>>
>>         mpiexec.hydra -n 1 ./ex52f -f0
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
>>         -rhs
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
>>         -loop_matrices flow_check -loop_folder
>>         /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin
>>         -superlu_default
>>
>>         Norm of error  3.7017E-11 iterations 1
>>          -->Test for matrix          168
>>         Norm of error  3.6420E-11 iterations 1
>>          -->Test for matrix          169
>>         Norm of error  3.7184E-11 iterations 1
>>          -->Test for matrix          170
>>         Norm of error  3.6847E-11 iterations 1
>>          -->Test for matrix          171
>>         Norm of error  3.7883E-11 iterations 1
>>          -->Test for matrix          172
>>         Norm of error  3.8805E-11 iterations 1
>>
>>         Thanks very much,
>>
>>         Danyang
>>
>>         On 15-12-03 01:59 PM, Hong wrote:
>>>         Danyang :
>>>         Further testing a_flow_check_168.bin,
>>>         ./ex10 -f0
>>>         /Users/Hong/Downloads/matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>>>         /Users/Hong/Downloads/matrix_and_rhs_bin/x_flow_check_168.bin -pc_type
>>>         lu -pc_factor_mat_solver_package superlu
>>>         -ksp_monitor_true_residual -mat_superlu_conditionnumber
>>>         Recip. condition number = 1.610480e-12
>>>           0 KSP preconditioned resid norm 6.873340313547e+09 true
>>>         resid norm 7.295020990196e+03 ||r(i)||/||b|| 1.000000000000e+00
>>>           1 KSP preconditioned resid norm 2.051833296449e-02 true
>>>         resid norm 2.976859070118e-02 ||r(i)||/||b|| 4.080672384793e-06
>>>         Number of iterations =   1
>>>         Residual norm 0.0297686
>>>
>>>         condition number of this matrix = 1/1.610480e-12 = 1.e+12,
>>>         i.e., this matrix is ill-conditioned.
>>>
>>>         Hong
>>>
>>>
>>>             Hi Hong,
>>>
>>>             The binary format of matrix, rhs and solution can be
>>>             downloaded via the link below.
>>>
>>>             https://www.dropbox.com/s/cl3gfi0s0kjlktf/matrix_and_rhs_bin.tar.gz?dl=0
>>>
>>>             Thanks,
>>>
>>>             Danyang
>>>
>>>
>>>             On 15-12-03 10:50 AM, Hong wrote:
>>>>             Danyang:
>>>>
>>>>
>>>>
>>>>                 To my surprising, solutions from SuperLU at
>>>>                 timestep 29 seems not correct for the first 4
>>>>                 Newton iterations, but the solutions from iteration
>>>>                 solver and MUMPS are correct.
>>>>
>>>>                 Please find all the matrices, rhs and solutions at
>>>>                 timestep 29 via the link below. The data is a bit
>>>>                 large so that I just share it through Dropbox. A
>>>>                 piece of matlab code to read these data and then
>>>>                 computer the norm has also been attached.
>>>>                 _https://www.dropbox.com/s/rr8ueysgflmxs7h/results-check.tar.gz?dl=0_
>>>>
>>>>
>>>>             Can you send us matrix in petsc binary format?
>>>>
>>>>             e.g., call MatView(M,
>>>>             PETSC_VIEWER_BINARY_(PETSC_COMM_WORLD))
>>>>             or '-ksp_view_mat binary'
>>>>
>>>>             Hong
>>>>
>>>>
>>>>
>>>>                 Below is a summary of the norm from the three
>>>>                 solvers at timestep 29, newton iteration 1 to 5.
>>>>
>>>>                 Timestep 29
>>>>                 Norm of residual seq 1.661321e-09, superlu
>>>>                 1.657103e+04, mumps 3.731225e-11
>>>>                 Norm of residual seq 1.753079e-09, superlu
>>>>                 6.675467e+02, mumps 1.509919e-13
>>>>                 Norm of residual seq 4.914971e-10, superlu
>>>>                 1.236362e-01, mumps 2.139303e-17
>>>>                 Norm of residual seq 3.532769e-10, superlu
>>>>                 1.304670e-04, mumps 5.387000e-20
>>>>                 Norm of residual seq 3.885629e-10, superlu
>>>>                 2.754876e-07, mumps 4.108675e-21
>>>>
>>>>                 Would anybody please check if SuperLU can solve
>>>>                 these matrices? Another possibility is that
>>>>                 something is wrong in my own code. But so far, I
>>>>                 cannot find any problem in my code since the same
>>>>                 code works fine if I using iterative solver or
>>>>                 direct solver MUMPS. But for other cases I have
>>>>                 tested, all these solvers work fine.
>>>>
>>>>                 Please let me know if I did not write down the
>>>>                 problem clearly.
>>>>
>>>>                 Thanks,
>>>>
>>>>                 Danyang
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151207/759c481e/attachment-0001.html>