[petsc-users] SuperLU convergence problem (More test)

Danyang Su dsu at eos.ubc.ca
Mon Dec 7 13:28:26 CST 2015


Hello Hong,

Thanks for the quick reply and the option "-mat_superlu_dist_fact 
SamePattern" works like a charm, if I use this option from the command 
line.

How can I add this option as the default. I tried using 
PetscOptionsInsertString("-mat_superlu_dist_fact SamePattern",ierr) in 
my code but this does not work.

Thanks,

Danyang

On 15-12-07 10:42 AM, Hong wrote:
> Danyang :
>
> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below 
> is how I figured it out.
>
> 1. Reading ex52f.F, I see '-superlu_default' = 
> '-pc_factor_mat_solver_package superlu_dist', the later enables 
> runtime options for other packages. I use superlu_dist-4.2 and 
> superlu-4.1 for the tests below.
>
> 2. Use the Matrix 168 to setup KSP solver and factorization, all 
> packages, petsc, superlu_dist and mumps give same correct results:
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs 
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check 
> -loop_folder matrix_and_rhs_bin -pc_type lu 
> -pc_factor_mat_solver_package petsc
>  -->loac matrix a
>  -->load rhs b
>  size l,m,n,mm       90000 90000       90000       90000
> Norm of error  7.7308E-11 iterations     1
>  -->Test for matrix          168
> ..
>  -->Test for matrix          172
> Norm of error  3.8461E-11 iterations     1
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs 
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check 
> -loop_folder matrix_and_rhs_bin -pc_type lu 
> -pc_factor_mat_solver_package superlu_dist
> Norm of error  9.4073E-11 iterations     1
>  -->Test for matrix          168
> ...
>  -->Test for matrix          172
> Norm of error  3.8187E-11 iterations     1
>
> 3. Use superlu, I get
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs 
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check 
> -loop_folder matrix_and_rhs_bin -pc_type lu 
> -pc_factor_mat_solver_package superlu
> Norm of error  1.0191E-06 iterations     1
>  -->Test for matrix  168
> ...
>  -->Test for matrix  172
> Norm of error  9.7858E-07 iterations     1
>
> Replacing default DiagPivotThresh: 1. to 0.0, I get same solutions as 
> other packages:
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs 
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check 
> -loop_folder matrix_and_rhs_bin -pc_type lu 
> -pc_factor_mat_solver_package superlu -mat_superlu_diagpivotthresh 0.0
>
> Norm of error  8.3614E-11 iterations     1
>  -->Test for matrix  168
> ...
>  -->Test for matrix  172
> Norm of error  3.7098E-11 iterations     1
>
> 4.
> using '-mat_view ascii::ascii_info', I found that a_flow_check_1.bin 
> and a_flow_check_168.bin seem have same structure:
>
>  -->loac matrix a
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=90000, cols=90000
>   total: nonzeros=895600, allocated nonzeros=895600
>   total number of mallocs used during MatSetValues calls =0
>     using I-node routines: found 45000 nodes, limit used is 5
>
> 5.
> Using a_flow_check_1.bin, I am able to reproduce the error you 
> reported: all packages give correct results except superlu_dist:
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs 
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check 
> -loop_folder matrix_and_rhs_bin -pc_type lu 
> -pc_factor_mat_solver_package superlu_dist
> Norm of error  2.5970E-12 iterations     1
>  -->Test for matrix  168
> Norm of error  1.3936E-01 iterations    34
>  -->Test for matrix  169
>
> I guess the error might come from reuse of matrix factor. Replacing 
> default
> -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
> -mat_superlu_dist_fact SamePattern, I get
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs 
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check 
> -loop_folder matrix_and_rhs_bin -pc_type lu 
> -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_fact 
> SamePattern
>
> Norm of error  2.5970E-12 iterations     1
>  -->Test for matrix  168
> Norm of error  9.4073E-11 iterations     1
>  -->Test for matrix  169
> Norm of error  6.4303E-11 iterations     1
>  -->Test for matrix  170
> Norm of error  7.4327E-11 iterations     1
>  -->Test for matrix  171
> Norm of error  5.4162E-11 iterations     1
>  -->Test for matrix  172
> Norm of error  3.4440E-11 iterations     1
>  --> End of test, bye
>
> Sherry may tell you why SamePattern_SameRowPerm cause the difference here.
> Best on the above experiments, I would set following as default
> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface.
>
> Hong
>
>     Hi Hong,
>
>     I did more test today and finally found that the solution accuracy
>     depends on the initial (first) matrix quality. I modified the
>     ex52f.F to do the test. There are 6 matrices and right-hand-side
>     vectors. All these matrices and rhs are from my reactive transport
>     simulation. Results will be quite different depending on which one
>     you use to do factorization. Results will also be different if you
>     run with different options. My code is similar to the First or the
>     Second test below. When the matrix is well conditioned, it works
>     fine. But if the initial matrix is well conditioned, it likely to
>     crash when the matrix become ill-conditioned. Since most of my
>     case are well conditioned so I didn't detect the problem before.
>     This case is a special one.
>
>
>     How can I avoid this problem? Shall I redo factorization? Can
>     PETSc automatically detect this prolbem or is there any option
>     available to do this?
>
>     All the data and test code (modified ex52f) can be found via the
>     dropbox link below.
>     _
>     __https://www.dropbox.com/s/4al1a60creogd8m/petsc-superlu-test.tar.gz?dl=0_
>
>
>     Summary of my test is shown below.
>
>     First, use the Matrix 1 to setup KSP solver and factorization,
>     then solve 168 to 172
>
>     mpiexec.hydra -n 1 ./ex52f -f0
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
>     -rhs
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
>     -loop_matrices flow_check -loop_folder
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type lu
>     -pc_factor_mat_solver_package superlu_dist
>
>     Norm of error  3.8815E-11 iterations     1
>      -->Test for matrix          168
>     Norm of error  4.2307E-01 iterations    32
>      -->Test for matrix          169
>     Norm of error  3.0528E-01 iterations    32
>      -->Test for matrix          170
>     Norm of error  3.1177E-01 iterations    32
>      -->Test for matrix          171
>     Norm of error  3.2793E-01 iterations    32
>      -->Test for matrix          172
>     Norm of error  3.1251E-01 iterations    31
>
>     Second, use the Matrix 1 to setup KSP solver and factorization
>     using the implemented SuperLU relative codes. I thought this will
>     generate the same results as the First test, but it actually not.
>
>     mpiexec.hydra -n 1 ./ex52f -f0
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
>     -rhs
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
>     -loop_matrices flow_check -loop_folder
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -superlu_default
>
>     Norm of error  2.2632E-12 iterations     1
>      -->Test for matrix          168
>     Norm of error  1.0817E+04 iterations     1
>      -->Test for matrix          169
>     Norm of error  1.0786E+04 iterations     1
>      -->Test for matrix          170
>     Norm of error  1.0792E+04 iterations     1
>      -->Test for matrix          171
>     Norm of error  1.0792E+04 iterations     1
>      -->Test for matrix          172
>     Norm of error  1.0792E+04 iterations     1
>
>
>     Third, use the Matrix 168 to setup KSP solver and factorization,
>     then solve 168 to 172
>
>     mpiexec.hydra -n 1 ./ex52f -f0
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
>     -rhs
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
>     -loop_matrices flow_check -loop_folder
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type lu
>     -pc_factor_mat_solver_package superlu_dist
>
>     Norm of error  9.5528E-10 iterations     1
>      -->Test for matrix          168
>     Norm of error  9.4945E-10 iterations     1
>      -->Test for matrix          169
>     Norm of error  6.4279E-10 iterations     1
>      -->Test for matrix          170
>     Norm of error  7.4633E-10 iterations     1
>      -->Test for matrix          171
>     Norm of error  7.4863E-10 iterations     1
>      -->Test for matrix          172
>     Norm of error  8.9701E-10 iterations     1
>
>     Fourth, use the Matrix 168 to setup KSP solver and factorization
>     using the implemented SuperLU relative codes. I thought this will
>     generate the same results as the Third test, but it actually not.
>
>     mpiexec.hydra -n 1 ./ex52f -f0
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
>     -rhs
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
>     -loop_matrices flow_check -loop_folder
>     /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -superlu_default
>
>     Norm of error  3.7017E-11 iterations     1
>      -->Test for matrix          168
>     Norm of error  3.6420E-11 iterations     1
>      -->Test for matrix          169
>     Norm of error  3.7184E-11 iterations     1
>      -->Test for matrix          170
>     Norm of error  3.6847E-11 iterations     1
>      -->Test for matrix          171
>     Norm of error  3.7883E-11 iterations     1
>      -->Test for matrix          172
>     Norm of error  3.8805E-11 iterations     1
>
>     Thanks very much,
>
>     Danyang
>
>     On 15-12-03 01:59 PM, Hong wrote:
>>     Danyang :
>>     Further testing a_flow_check_168.bin,
>>     ./ex10 -f0
>>     /Users/Hong/Downloads/matrix_and_rhs_bin/a_flow_check_168.bin
>>     -rhs
>>     /Users/Hong/Downloads/matrix_and_rhs_bin/x_flow_check_168.bin
>>     -pc_type lu -pc_factor_mat_solver_package superlu
>>     -ksp_monitor_true_residual -mat_superlu_conditionnumber
>>       Recip. condition number = 1.610480e-12
>>       0 KSP preconditioned resid norm 6.873340313547e+09 true resid
>>     norm 7.295020990196e+03 ||r(i)||/||b|| 1.000000000000e+00
>>       1 KSP preconditioned resid norm 2.051833296449e-02 true resid
>>     norm 2.976859070118e-02 ||r(i)||/||b|| 4.080672384793e-06
>>     Number of iterations =   1
>>     Residual norm 0.0297686
>>
>>     condition number of this matrix = 1/1.610480e-12 = 1.e+12,
>>     i.e., this matrix is ill-conditioned.
>>
>>     Hong
>>
>>
>>         Hi Hong,
>>
>>         The binary format of matrix, rhs and solution can be
>>         downloaded via the link below.
>>
>>         https://www.dropbox.com/s/cl3gfi0s0kjlktf/matrix_and_rhs_bin.tar.gz?dl=0
>>
>>         Thanks,
>>
>>         Danyang
>>
>>
>>         On 15-12-03 10:50 AM, Hong wrote:
>>>         Danyang:
>>>
>>>
>>>
>>>             To my surprising, solutions from SuperLU at timestep 29
>>>             seems not correct for the first 4 Newton iterations, but
>>>             the solutions from iteration solver and MUMPS are correct.
>>>
>>>             Please find all the matrices, rhs and solutions at
>>>             timestep 29 via the link below. The data is a bit large
>>>             so that I just share it through Dropbox. A piece of
>>>             matlab code to read these data and then computer the
>>>             norm has also been attached.
>>>             _https://www.dropbox.com/s/rr8ueysgflmxs7h/results-check.tar.gz?dl=0_
>>>
>>>
>>>         Can you send us matrix in petsc binary format?
>>>
>>>         e.g., call MatView(M, PETSC_VIEWER_BINARY_(PETSC_COMM_WORLD))
>>>         or '-ksp_view_mat binary'
>>>
>>>         Hong
>>>
>>>
>>>
>>>             Below is a summary of the norm from the three solvers at
>>>             timestep 29, newton iteration 1 to 5.
>>>
>>>             Timestep 29
>>>             Norm of residual seq 1.661321e-09, superlu 1.657103e+04,
>>>             mumps 3.731225e-11
>>>             Norm of residual seq 1.753079e-09, superlu 6.675467e+02,
>>>             mumps 1.509919e-13
>>>             Norm of residual seq 4.914971e-10, superlu 1.236362e-01,
>>>             mumps 2.139303e-17
>>>             Norm of residual seq 3.532769e-10, superlu 1.304670e-04,
>>>             mumps 5.387000e-20
>>>             Norm of residual seq 3.885629e-10, superlu 2.754876e-07,
>>>             mumps 4.108675e-21
>>>
>>>             Would anybody please check if SuperLU can solve these
>>>             matrices? Another possibility is that something is wrong
>>>             in my own code. But so far, I cannot find any problem in
>>>             my code since the same code works fine if I using
>>>             iterative solver or direct solver MUMPS. But for other
>>>             cases I have tested,  all these solvers work fine.
>>>
>>>             Please let me know if I did not write down the problem
>>>             clearly.
>>>
>>>             Thanks,
>>>
>>>             Danyang
>>>
>>>
>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151207/81c75b6b/attachment-0001.html>


More information about the petsc-users mailing list