[petsc-users] SuperLU convergence problem (More test)

Mon Dec 7 14:01:20 CST 2015

Danyang:
Add 'call MatSetFromOptions(A,ierr)' to your code.
Attached below is ex52f.F modified from your ex52f.F to be compatible with
petsc-dev.

Hong

Hello Hong,
>
> Thanks for the quick reply and the option "-mat_superlu_dist_fact
> SamePattern" works like a charm, if I use this option from the command
> line.
>
> How can I add this option as the default. I tried using
> PetscOptionsInsertString("-mat_superlu_dist_fact SamePattern",ierr) in my
> code but this does not work.
>
> Thanks,
>
> Danyang
>
>
> On 15-12-07 10:42 AM, Hong wrote:
>
> Danyang :
>
> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is
> how I figured it out.
>
> 1. Reading ex52f.F, I see '-superlu_default' =
> '-pc_factor_mat_solver_package superlu_dist', the later enables runtime
> options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the
> tests below.
>
> 2. Use the Matrix 168 to setup KSP solver and factorization, all packages,
> petsc, superlu_dist and mumps give same correct results:
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
> petsc
>  -->loac matrix a
>  -->load rhs b
>  size l,m,n,mm       90000       90000       90000       90000
> Norm of error  7.7308E-11 iterations     1
>  -->Test for matrix          168
> ..
>  -->Test for matrix          172
> Norm of error  3.8461E-11 iterations     1
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
> superlu_dist
> Norm of error  9.4073E-11 iterations     1
>  -->Test for matrix          168
> ...
>  -->Test for matrix          172
> Norm of error  3.8187E-11 iterations     1
>
> 3. Use superlu, I get
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
> superlu
> Norm of error  1.0191E-06 iterations     1
>  -->Test for matrix          168
> ...
>  -->Test for matrix          172
> Norm of error  9.7858E-07 iterations     1
>
> Replacing default DiagPivotThresh: 1. to 0.0, I get same solutions as
> other packages:
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
> superlu -mat_superlu_diagpivotthresh 0.0
>
> Norm of error  8.3614E-11 iterations     1
>  -->Test for matrix          168
> ...
>  -->Test for matrix          172
> Norm of error  3.7098E-11 iterations     1
>
> 4.
> using '-mat_view ascii::ascii_info', I found that a_flow_check_1.bin and
> a_flow_check_168.bin seem have same structure:
>
>  -->loac matrix a
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=90000, cols=90000
>   total: nonzeros=895600, allocated nonzeros=895600
>   total number of mallocs used during MatSetValues calls =0
>     using I-node routines: found 45000 nodes, limit used is 5
>
> 5.
> Using a_flow_check_1.bin, I am able to reproduce the error you reported:
> all packages give correct results except superlu_dist:
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
> superlu_dist
> Norm of error  2.5970E-12 iterations     1
>  -->Test for matrix          168
> Norm of error  1.3936E-01 iterations    34
>  -->Test for matrix          169
>
> I guess the error might come from reuse of matrix factor. Replacing default
> -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
> -mat_superlu_dist_fact SamePattern, I get
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
> superlu_dist -mat_superlu_dist_fact SamePattern
>
> Norm of error  2.5970E-12 iterations     1
>  -->Test for matrix          168
> Norm of error  9.4073E-11 iterations     1
>  -->Test for matrix          169
> Norm of error  6.4303E-11 iterations     1
>  -->Test for matrix          170
> Norm of error  7.4327E-11 iterations     1
>  -->Test for matrix          171
> Norm of error  5.4162E-11 iterations     1
>  -->Test for matrix          172
> Norm of error  3.4440E-11 iterations     1
>  --> End of test, bye
>
> Sherry may tell you why SamePattern_SameRowPerm cause the difference here.
> Best on the above experiments, I would set following as default
> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface.
>
> Hong
>
> Hi Hong,
>>
>> I did more test today and finally found that the solution accuracy
>> depends on the initial (first) matrix quality. I modified the ex52f.F to do
>> the test. There are 6 matrices and right-hand-side vectors. All these
>> matrices and rhs are from my reactive transport simulation. Results will be
>> quite different depending on which one you use to do factorization. Results
>> will also be different if you run with different options. My code is
>> similar to the First or the Second test below. When the matrix is well
>> conditioned, it works fine. But if the initial matrix is well conditioned,
>> it likely to crash when the matrix become ill-conditioned. Since most of my
>> case are well conditioned so I didn't detect the problem before. This case
>> is a special one.
>>
>>
>> How can I avoid this problem? Shall I redo factorization? Can PETSc
>> automatically detect this prolbem or is there any option available to do
>> this?
>>
>> All the data and test code (modified ex52f) can be found via the dropbox
>> link below.
>>
>> *https://www.dropbox.com/s/4al1a60creogd8m/petsc-superlu-test.tar.gz?dl=0
>> <https://www.dropbox.com/s/4al1a60creogd8m/petsc-superlu-test.tar.gz?dl=0>*
>>
>>
>> Summary of my test is shown below.
>>
>> First, use the Matrix 1 to setup KSP solver and factorization, then solve
>> 168 to 172
>>
>> mpiexec.hydra -n 1 ./ex52f -f0
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
>> -rhs
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
>> -loop_matrices flow_check -loop_folder
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type lu
>> -pc_factor_mat_solver_package superlu_dist
>>
>> Norm of error  3.8815E-11 iterations     1
>>  -->Test for matrix          168
>> Norm of error  4.2307E-01 iterations    32
>>  -->Test for matrix          169
>> Norm of error  3.0528E-01 iterations    32
>>  -->Test for matrix          170
>> Norm of error  3.1177E-01 iterations    32
>>  -->Test for matrix          171
>> Norm of error  3.2793E-01 iterations    32
>>  -->Test for matrix          172
>> Norm of error  3.1251E-01 iterations    31
>>
>> Second, use the Matrix 1 to setup KSP solver and factorization using the
>> implemented SuperLU relative codes. I thought this will generate the same
>> results as the First test, but it actually not.
>>
>> mpiexec.hydra -n 1 ./ex52f -f0
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
>> -rhs
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
>> -loop_matrices flow_check -loop_folder
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -superlu_default
>>
>> Norm of error  2.2632E-12 iterations     1
>>  -->Test for matrix          168
>> Norm of error  1.0817E+04 iterations     1
>>  -->Test for matrix          169
>> Norm of error  1.0786E+04 iterations     1
>>  -->Test for matrix          170
>> Norm of error  1.0792E+04 iterations     1
>>  -->Test for matrix          171
>> Norm of error  1.0792E+04 iterations     1
>>  -->Test for matrix          172
>> Norm of error  1.0792E+04 iterations     1
>>
>>
>> Third, use the Matrix 168 to setup KSP solver and factorization, then
>> solve 168 to 172
>>
>> mpiexec.hydra -n 1 ./ex52f -f0
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
>> -rhs
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
>> -loop_matrices flow_check -loop_folder
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type lu
>> -pc_factor_mat_solver_package superlu_dist
>>
>> Norm of error  9.5528E-10 iterations     1
>>  -->Test for matrix          168
>> Norm of error  9.4945E-10 iterations     1
>>  -->Test for matrix          169
>> Norm of error  6.4279E-10 iterations     1
>>  -->Test for matrix          170
>> Norm of error  7.4633E-10 iterations     1
>>  -->Test for matrix          171
>> Norm of error  7.4863E-10 iterations     1
>>  -->Test for matrix          172
>> Norm of error  8.9701E-10 iterations     1
>>
>> Fourth, use the Matrix 168 to setup KSP solver and factorization using
>> the implemented SuperLU relative codes. I thought this will generate the
>> same results as the Third test, but it actually not.
>>
>> mpiexec.hydra -n 1 ./ex52f -f0
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
>> -rhs
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
>> -loop_matrices flow_check -loop_folder
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -superlu_default
>>
>> Norm of error  3.7017E-11 iterations     1
>>  -->Test for matrix          168
>> Norm of error  3.6420E-11 iterations     1
>>  -->Test for matrix          169
>> Norm of error  3.7184E-11 iterations     1
>>  -->Test for matrix          170
>> Norm of error  3.6847E-11 iterations     1
>>  -->Test for matrix          171
>> Norm of error  3.7883E-11 iterations     1
>>  -->Test for matrix          172
>> Norm of error  3.8805E-11 iterations     1
>>
>> Thanks very much,
>>
>> Danyang
>>
>> On 15-12-03 01:59 PM, Hong wrote:
>>
>> Danyang :
>> Further testing a_flow_check_168.bin,
>> ./ex10 -f0 /Users/Hong/Downloads/matrix_and_rhs_bin/a_flow_check_168.bin
>> -rhs /Users/Hong/Downloads/matrix_and_rhs_bin/x_flow_check_168.bin -pc_type
>> lu -pc_factor_mat_solver_package superlu -ksp_monitor_true_residual
>> -mat_superlu_conditionnumber
>>   Recip. condition number = 1.610480e-12
>>   0 KSP preconditioned resid norm 6.873340313547e+09 true resid norm
>> 7.295020990196e+03 ||r(i)||/||b|| 1.000000000000e+00
>>   1 KSP preconditioned resid norm 2.051833296449e-02 true resid norm
>> 2.976859070118e-02 ||r(i)||/||b|| 4.080672384793e-06
>> Number of iterations =   1
>> Residual norm 0.0297686
>>
>> condition number of this matrix = 1/1.610480e-12 = 1.e+12,
>> i.e., this matrix is ill-conditioned.
>>
>> Hong
>>
>>
>> Hi Hong,
>>>
>>> The binary format of matrix, rhs and solution can be downloaded via the
>>> link below.
>>>
>>> https://www.dropbox.com/s/cl3gfi0s0kjlktf/matrix_and_rhs_bin.tar.gz?dl=0
>>>
>>> Thanks,
>>>
>>> Danyang
>>>
>>>
>>> On 15-12-03 10:50 AM, Hong wrote:
>>>
>>> Danyang:
>>>
>>>>
>>>>
>>>> To my surprising, solutions from SuperLU at timestep 29 seems not
>>>> correct for the first 4 Newton iterations, but the solutions from iteration
>>>> solver and MUMPS are correct.
>>>>
>>>> Please find all the matrices, rhs and solutions at timestep 29 via the
>>>> link below. The data is a bit large so that I just share it through
>>>> Dropbox. A piece of matlab code to read these data and then computer the
>>>> norm has also been attached.
>>>> *https://www.dropbox.com/s/rr8ueysgflmxs7h/results-check.tar.gz?dl=0
>>>> <https://www.dropbox.com/s/rr8ueysgflmxs7h/results-check.tar.gz?dl=0>*
>>>>
>>>
>>> Can you send us matrix in petsc binary format?
>>>
>>> e.g., call MatView(M, PETSC_VIEWER_BINARY_(PETSC_COMM_WORLD))
>>> or '-ksp_view_mat binary'
>>>
>>> Hong
>>>
>>>>
>>>>
>>>> Below is a summary of the norm from the three solvers at timestep 29,
>>>> newton iteration 1 to 5.
>>>>
>>>> Timestep 29
>>>> Norm of residual seq 1.661321e-09, superlu 1.657103e+04, mumps
>>>> 3.731225e-11
>>>> Norm of residual seq 1.753079e-09, superlu 6.675467e+02, mumps
>>>> 1.509919e-13
>>>> Norm of residual seq 4.914971e-10, superlu 1.236362e-01, mumps
>>>> 2.139303e-17
>>>> Norm of residual seq 3.532769e-10, superlu 1.304670e-04, mumps
>>>> 5.387000e-20
>>>> Norm of residual seq 3.885629e-10, superlu 2.754876e-07, mumps
>>>> 4.108675e-21
>>>>
>>>> Would anybody please check if SuperLU can solve these matrices? Another
>>>> possibility is that something is wrong in my own code. But so far, I cannot
>>>> find any problem in my code since the same code works fine if I using
>>>> iterative solver or direct solver MUMPS. But for other cases I have
>>>> tested,  all these solvers work fine.
>>>>
>>>> Please let me know if I did not write down the problem clearly.
>>>>
>>>> Thanks,
>>>>
>>>> Danyang
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151207/8c531e0b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex52f.F
Type: application/octet-stream
Size: 10333 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151207/8c531e0b/attachment-0001.obj>