[petsc-users] SuperLU convergence problem (More test)
Danyang Su
dsu at eos.ubc.ca
Mon Dec 7 14:14:26 CST 2015
Thank. The inserted options works now. I didn't put
PetscOptionsInsertString in the right place before.
Danyang
On 15-12-07 12:01 PM, Hong wrote:
> Danyang:
> Add 'call MatSetFromOptions(A,ierr)' to your code.
> Attached below is ex52f.F modified from your ex52f.F to be compatible
> with petsc-dev.
>
> Hong
>
> Hello Hong,
>
> Thanks for the quick reply and the option "-mat_superlu_dist_fact
> SamePattern" works like a charm, if I use this option from the
> command line.
>
> How can I add this option as the default. I tried using
> PetscOptionsInsertString("-mat_superlu_dist_fact
> SamePattern",ierr) in my code but this does not work.
>
> Thanks,
>
> Danyang
>
>
> On 15-12-07 10:42 AM, Hong wrote:
>> Danyang :
>>
>> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem.
>> Below is how I figured it out.
>>
>> 1. Reading ex52f.F, I see '-superlu_default' =
>> '-pc_factor_mat_solver_package superlu_dist', the later enables
>> runtime options for other packages. I use superlu_dist-4.2 and
>> superlu-4.1 for the tests below.
>>
>> 2. Use the Matrix 168 to setup KSP solver and factorization, all
>> packages, petsc, superlu_dist and mumps give same correct results:
>>
>> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>> -loop_folder matrix_and_rhs_bin -pc_type lu
>> -pc_factor_mat_solver_package petsc
>> -->loac matrix a
>> -->load rhs b
>> size l,m,n,mm 90000 90000 90000 90000
>> Norm of error 7.7308E-11 iterations 1
>> -->Test for matrix 168
>> ..
>> -->Test for matrix 172
>> Norm of error 3.8461E-11 iterations 1
>>
>> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>> -loop_folder matrix_and_rhs_bin -pc_type lu
>> -pc_factor_mat_solver_package superlu_dist
>> Norm of error 9.4073E-11 iterations 1
>> -->Test for matrix 168
>> ...
>> -->Test for matrix 172
>> Norm of error 3.8187E-11 iterations 1
>>
>> 3. Use superlu, I get
>> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>> -loop_folder matrix_and_rhs_bin -pc_type lu
>> -pc_factor_mat_solver_package superlu
>> Norm of error 1.0191E-06 iterations 1
>> -->Test for matrix 168
>> ...
>> -->Test for matrix 172
>> Norm of error 9.7858E-07 iterations 1
>>
>> Replacing default DiagPivotThresh: 1. to 0.0, I get same
>> solutions as other packages:
>>
>> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>> -loop_folder matrix_and_rhs_bin -pc_type lu
>> -pc_factor_mat_solver_package superlu
>> -mat_superlu_diagpivotthresh 0.0
>>
>> Norm of error 8.3614E-11 iterations 1
>> -->Test for matrix 168
>> ...
>> -->Test for matrix 172
>> Norm of error 3.7098E-11 iterations 1
>>
>> 4.
>> using '-mat_view ascii::ascii_info', I found that
>> a_flow_check_1.bin and a_flow_check_168.bin seem have same structure:
>>
>> -->loac matrix a
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=90000, cols=90000
>> total: nonzeros=895600, allocated nonzeros=895600
>> total number of mallocs used during MatSetValues calls =0
>> using I-node routines: found 45000 nodes, limit used is 5
>>
>> 5.
>> Using a_flow_check_1.bin, I am able to reproduce the error you
>> reported: all packages give correct results except superlu_dist:
>> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
>> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>> -loop_folder matrix_and_rhs_bin -pc_type lu
>> -pc_factor_mat_solver_package superlu_dist
>> Norm of error 2.5970E-12 iterations 1
>> -->Test for matrix 168
>> Norm of error 1.3936E-01 iterations 34
>> -->Test for matrix 169
>>
>> I guess the error might come from reuse of matrix factor.
>> Replacing default
>> -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
>> -mat_superlu_dist_fact SamePattern, I get
>>
>> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
>> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>> -loop_folder matrix_and_rhs_bin -pc_type lu
>> -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_fact
>> SamePattern
>>
>> Norm of error 2.5970E-12 iterations 1
>> -->Test for matrix 168
>> Norm of error 9.4073E-11 iterations 1
>> -->Test for matrix 169
>> Norm of error 6.4303E-11 iterations 1
>> -->Test for matrix 170
>> Norm of error 7.4327E-11 iterations 1
>> -->Test for matrix 171
>> Norm of error 5.4162E-11 iterations 1
>> -->Test for matrix 172
>> Norm of error 3.4440E-11 iterations 1
>> --> End of test, bye
>>
>> Sherry may tell you why SamePattern_SameRowPerm cause the
>> difference here.
>> Best on the above experiments, I would set following as default
>> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
>> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface.
>>
>> Hong
>>
>> Hi Hong,
>>
>> I did more test today and finally found that the solution
>> accuracy depends on the initial (first) matrix quality. I
>> modified the ex52f.F to do the test. There are 6 matrices and
>> right-hand-side vectors. All these matrices and rhs are from
>> my reactive transport simulation. Results will be quite
>> different depending on which one you use to do factorization.
>> Results will also be different if you run with different
>> options. My code is similar to the First or the Second test
>> below. When the matrix is well conditioned, it works fine.
>> But if the initial matrix is well conditioned, it likely to
>> crash when the matrix become ill-conditioned. Since most of
>> my case are well conditioned so I didn't detect the problem
>> before. This case is a special one.
>>
>>
>> How can I avoid this problem? Shall I redo factorization? Can
>> PETSc automatically detect this prolbem or is there any
>> option available to do this?
>>
>> All the data and test code (modified ex52f) can be found via
>> the dropbox link below.
>> _
>> __https://www.dropbox.com/s/4al1a60creogd8m/petsc-superlu-test.tar.gz?dl=0_
>>
>>
>> Summary of my test is shown below.
>>
>> First, use the Matrix 1 to setup KSP solver and
>> factorization, then solve 168 to 172
>>
>> mpiexec.hydra -n 1 ./ex52f -f0
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
>> -rhs
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
>> -loop_matrices flow_check -loop_folder
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type
>> lu -pc_factor_mat_solver_package superlu_dist
>>
>> Norm of error 3.8815E-11 iterations 1
>> -->Test for matrix 168
>> Norm of error 4.2307E-01 iterations 32
>> -->Test for matrix 169
>> Norm of error 3.0528E-01 iterations 32
>> -->Test for matrix 170
>> Norm of error 3.1177E-01 iterations 32
>> -->Test for matrix 171
>> Norm of error 3.2793E-01 iterations 32
>> -->Test for matrix 172
>> Norm of error 3.1251E-01 iterations 31
>>
>> Second, use the Matrix 1 to setup KSP solver and
>> factorization using the implemented SuperLU relative codes. I
>> thought this will generate the same results as the First
>> test, but it actually not.
>>
>> mpiexec.hydra -n 1 ./ex52f -f0
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
>> -rhs
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
>> -loop_matrices flow_check -loop_folder
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin
>> -superlu_default
>>
>> Norm of error 2.2632E-12 iterations 1
>> -->Test for matrix 168
>> Norm of error 1.0817E+04 iterations 1
>> -->Test for matrix 169
>> Norm of error 1.0786E+04 iterations 1
>> -->Test for matrix 170
>> Norm of error 1.0792E+04 iterations 1
>> -->Test for matrix 171
>> Norm of error 1.0792E+04 iterations 1
>> -->Test for matrix 172
>> Norm of error 1.0792E+04 iterations 1
>>
>>
>> Third, use the Matrix 168 to setup KSP solver and
>> factorization, then solve 168 to 172
>>
>> mpiexec.hydra -n 1 ./ex52f -f0
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
>> -rhs
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
>> -loop_matrices flow_check -loop_folder
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type
>> lu -pc_factor_mat_solver_package superlu_dist
>>
>> Norm of error 9.5528E-10 iterations 1
>> -->Test for matrix 168
>> Norm of error 9.4945E-10 iterations 1
>> -->Test for matrix 169
>> Norm of error 6.4279E-10 iterations 1
>> -->Test for matrix 170
>> Norm of error 7.4633E-10 iterations 1
>> -->Test for matrix 171
>> Norm of error 7.4863E-10 iterations 1
>> -->Test for matrix 172
>> Norm of error 8.9701E-10 iterations 1
>>
>> Fourth, use the Matrix 168 to setup KSP solver and
>> factorization using the implemented SuperLU relative codes. I
>> thought this will generate the same results as the Third
>> test, but it actually not.
>>
>> mpiexec.hydra -n 1 ./ex52f -f0
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
>> -rhs
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
>> -loop_matrices flow_check -loop_folder
>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin
>> -superlu_default
>>
>> Norm of error 3.7017E-11 iterations 1
>> -->Test for matrix 168
>> Norm of error 3.6420E-11 iterations 1
>> -->Test for matrix 169
>> Norm of error 3.7184E-11 iterations 1
>> -->Test for matrix 170
>> Norm of error 3.6847E-11 iterations 1
>> -->Test for matrix 171
>> Norm of error 3.7883E-11 iterations 1
>> -->Test for matrix 172
>> Norm of error 3.8805E-11 iterations 1
>>
>> Thanks very much,
>>
>> Danyang
>>
>> On 15-12-03 01:59 PM, Hong wrote:
>>> Danyang :
>>> Further testing a_flow_check_168.bin,
>>> ./ex10 -f0
>>> /Users/Hong/Downloads/matrix_and_rhs_bin/a_flow_check_168.bin -rhs
>>> /Users/Hong/Downloads/matrix_and_rhs_bin/x_flow_check_168.bin -pc_type
>>> lu -pc_factor_mat_solver_package superlu
>>> -ksp_monitor_true_residual -mat_superlu_conditionnumber
>>> Recip. condition number = 1.610480e-12
>>> 0 KSP preconditioned resid norm 6.873340313547e+09 true
>>> resid norm 7.295020990196e+03 ||r(i)||/||b|| 1.000000000000e+00
>>> 1 KSP preconditioned resid norm 2.051833296449e-02 true
>>> resid norm 2.976859070118e-02 ||r(i)||/||b|| 4.080672384793e-06
>>> Number of iterations = 1
>>> Residual norm 0.0297686
>>>
>>> condition number of this matrix = 1/1.610480e-12 = 1.e+12,
>>> i.e., this matrix is ill-conditioned.
>>>
>>> Hong
>>>
>>>
>>> Hi Hong,
>>>
>>> The binary format of matrix, rhs and solution can be
>>> downloaded via the link below.
>>>
>>> https://www.dropbox.com/s/cl3gfi0s0kjlktf/matrix_and_rhs_bin.tar.gz?dl=0
>>>
>>> Thanks,
>>>
>>> Danyang
>>>
>>>
>>> On 15-12-03 10:50 AM, Hong wrote:
>>>> Danyang:
>>>>
>>>>
>>>>
>>>> To my surprising, solutions from SuperLU at
>>>> timestep 29 seems not correct for the first 4
>>>> Newton iterations, but the solutions from iteration
>>>> solver and MUMPS are correct.
>>>>
>>>> Please find all the matrices, rhs and solutions at
>>>> timestep 29 via the link below. The data is a bit
>>>> large so that I just share it through Dropbox. A
>>>> piece of matlab code to read these data and then
>>>> computer the norm has also been attached.
>>>> _https://www.dropbox.com/s/rr8ueysgflmxs7h/results-check.tar.gz?dl=0_
>>>>
>>>>
>>>> Can you send us matrix in petsc binary format?
>>>>
>>>> e.g., call MatView(M,
>>>> PETSC_VIEWER_BINARY_(PETSC_COMM_WORLD))
>>>> or '-ksp_view_mat binary'
>>>>
>>>> Hong
>>>>
>>>>
>>>>
>>>> Below is a summary of the norm from the three
>>>> solvers at timestep 29, newton iteration 1 to 5.
>>>>
>>>> Timestep 29
>>>> Norm of residual seq 1.661321e-09, superlu
>>>> 1.657103e+04, mumps 3.731225e-11
>>>> Norm of residual seq 1.753079e-09, superlu
>>>> 6.675467e+02, mumps 1.509919e-13
>>>> Norm of residual seq 4.914971e-10, superlu
>>>> 1.236362e-01, mumps 2.139303e-17
>>>> Norm of residual seq 3.532769e-10, superlu
>>>> 1.304670e-04, mumps 5.387000e-20
>>>> Norm of residual seq 3.885629e-10, superlu
>>>> 2.754876e-07, mumps 4.108675e-21
>>>>
>>>> Would anybody please check if SuperLU can solve
>>>> these matrices? Another possibility is that
>>>> something is wrong in my own code. But so far, I
>>>> cannot find any problem in my code since the same
>>>> code works fine if I using iterative solver or
>>>> direct solver MUMPS. But for other cases I have
>>>> tested, all these solvers work fine.
>>>>
>>>> Please let me know if I did not write down the
>>>> problem clearly.
>>>>
>>>> Thanks,
>>>>
>>>> Danyang
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151207/759c481e/attachment-0001.html>
More information about the petsc-users
mailing list