[petsc-users] SuperLU convergence problem (More test)
Danyang Su
dsu at eos.ubc.ca
Mon Dec 7 13:28:26 CST 2015
Hello Hong,
Thanks for the quick reply and the option "-mat_superlu_dist_fact
SamePattern" works like a charm, if I use this option from the command
line.
How can I add this option as the default. I tried using
PetscOptionsInsertString("-mat_superlu_dist_fact SamePattern",ierr) in
my code but this does not work.
Thanks,
Danyang
On 15-12-07 10:42 AM, Hong wrote:
> Danyang :
>
> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below
> is how I figured it out.
>
> 1. Reading ex52f.F, I see '-superlu_default' =
> '-pc_factor_mat_solver_package superlu_dist', the later enables
> runtime options for other packages. I use superlu_dist-4.2 and
> superlu-4.1 for the tests below.
>
> 2. Use the Matrix 168 to setup KSP solver and factorization, all
> packages, petsc, superlu_dist and mumps give same correct results:
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package petsc
> -->loac matrix a
> -->load rhs b
> size l,m,n,mm 90000 90000 90000 90000
> Norm of error 7.7308E-11 iterations 1
> -->Test for matrix 168
> ..
> -->Test for matrix 172
> Norm of error 3.8461E-11 iterations 1
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package superlu_dist
> Norm of error 9.4073E-11 iterations 1
> -->Test for matrix 168
> ...
> -->Test for matrix 172
> Norm of error 3.8187E-11 iterations 1
>
> 3. Use superlu, I get
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package superlu
> Norm of error 1.0191E-06 iterations 1
> -->Test for matrix 168
> ...
> -->Test for matrix 172
> Norm of error 9.7858E-07 iterations 1
>
> Replacing default DiagPivotThresh: 1. to 0.0, I get same solutions as
> other packages:
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package superlu -mat_superlu_diagpivotthresh 0.0
>
> Norm of error 8.3614E-11 iterations 1
> -->Test for matrix 168
> ...
> -->Test for matrix 172
> Norm of error 3.7098E-11 iterations 1
>
> 4.
> using '-mat_view ascii::ascii_info', I found that a_flow_check_1.bin
> and a_flow_check_168.bin seem have same structure:
>
> -->loac matrix a
> Mat Object: 1 MPI processes
> type: seqaij
> rows=90000, cols=90000
> total: nonzeros=895600, allocated nonzeros=895600
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 45000 nodes, limit used is 5
>
> 5.
> Using a_flow_check_1.bin, I am able to reproduce the error you
> reported: all packages give correct results except superlu_dist:
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package superlu_dist
> Norm of error 2.5970E-12 iterations 1
> -->Test for matrix 168
> Norm of error 1.3936E-01 iterations 34
> -->Test for matrix 169
>
> I guess the error might come from reuse of matrix factor. Replacing
> default
> -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
> -mat_superlu_dist_fact SamePattern, I get
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_fact
> SamePattern
>
> Norm of error 2.5970E-12 iterations 1
> -->Test for matrix 168
> Norm of error 9.4073E-11 iterations 1
> -->Test for matrix 169
> Norm of error 6.4303E-11 iterations 1
> -->Test for matrix 170
> Norm of error 7.4327E-11 iterations 1
> -->Test for matrix 171
> Norm of error 5.4162E-11 iterations 1
> -->Test for matrix 172
> Norm of error 3.4440E-11 iterations 1
> --> End of test, bye
>
> Sherry may tell you why SamePattern_SameRowPerm cause the difference here.
> Best on the above experiments, I would set following as default
> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface.
>
> Hong
>
> Hi Hong,
>
> I did more test today and finally found that the solution accuracy
> depends on the initial (first) matrix quality. I modified the
> ex52f.F to do the test. There are 6 matrices and right-hand-side
> vectors. All these matrices and rhs are from my reactive transport
> simulation. Results will be quite different depending on which one
> you use to do factorization. Results will also be different if you
> run with different options. My code is similar to the First or the
> Second test below. When the matrix is well conditioned, it works
> fine. But if the initial matrix is well conditioned, it likely to
> crash when the matrix become ill-conditioned. Since most of my
> case are well conditioned so I didn't detect the problem before.
> This case is a special one.
>
>
> How can I avoid this problem? Shall I redo factorization? Can
> PETSc automatically detect this prolbem or is there any option
> available to do this?
>
> All the data and test code (modified ex52f) can be found via the
> dropbox link below.
> _
> __https://www.dropbox.com/s/4al1a60creogd8m/petsc-superlu-test.tar.gz?dl=0_
>
>
> Summary of my test is shown below.
>
> First, use the Matrix 1 to setup KSP solver and factorization,
> then solve 168 to 172
>
> mpiexec.hydra -n 1 ./ex52f -f0
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
> -rhs
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
> -loop_matrices flow_check -loop_folder
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package superlu_dist
>
> Norm of error 3.8815E-11 iterations 1
> -->Test for matrix 168
> Norm of error 4.2307E-01 iterations 32
> -->Test for matrix 169
> Norm of error 3.0528E-01 iterations 32
> -->Test for matrix 170
> Norm of error 3.1177E-01 iterations 32
> -->Test for matrix 171
> Norm of error 3.2793E-01 iterations 32
> -->Test for matrix 172
> Norm of error 3.1251E-01 iterations 31
>
> Second, use the Matrix 1 to setup KSP solver and factorization
> using the implemented SuperLU relative codes. I thought this will
> generate the same results as the First test, but it actually not.
>
> mpiexec.hydra -n 1 ./ex52f -f0
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin
> -rhs
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin
> -loop_matrices flow_check -loop_folder
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -superlu_default
>
> Norm of error 2.2632E-12 iterations 1
> -->Test for matrix 168
> Norm of error 1.0817E+04 iterations 1
> -->Test for matrix 169
> Norm of error 1.0786E+04 iterations 1
> -->Test for matrix 170
> Norm of error 1.0792E+04 iterations 1
> -->Test for matrix 171
> Norm of error 1.0792E+04 iterations 1
> -->Test for matrix 172
> Norm of error 1.0792E+04 iterations 1
>
>
> Third, use the Matrix 168 to setup KSP solver and factorization,
> then solve 168 to 172
>
> mpiexec.hydra -n 1 ./ex52f -f0
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
> -rhs
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
> -loop_matrices flow_check -loop_folder
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package superlu_dist
>
> Norm of error 9.5528E-10 iterations 1
> -->Test for matrix 168
> Norm of error 9.4945E-10 iterations 1
> -->Test for matrix 169
> Norm of error 6.4279E-10 iterations 1
> -->Test for matrix 170
> Norm of error 7.4633E-10 iterations 1
> -->Test for matrix 171
> Norm of error 7.4863E-10 iterations 1
> -->Test for matrix 172
> Norm of error 8.9701E-10 iterations 1
>
> Fourth, use the Matrix 168 to setup KSP solver and factorization
> using the implemented SuperLU relative codes. I thought this will
> generate the same results as the Third test, but it actually not.
>
> mpiexec.hydra -n 1 ./ex52f -f0
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin
> -rhs
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin
> -loop_matrices flow_check -loop_folder
> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -superlu_default
>
> Norm of error 3.7017E-11 iterations 1
> -->Test for matrix 168
> Norm of error 3.6420E-11 iterations 1
> -->Test for matrix 169
> Norm of error 3.7184E-11 iterations 1
> -->Test for matrix 170
> Norm of error 3.6847E-11 iterations 1
> -->Test for matrix 171
> Norm of error 3.7883E-11 iterations 1
> -->Test for matrix 172
> Norm of error 3.8805E-11 iterations 1
>
> Thanks very much,
>
> Danyang
>
> On 15-12-03 01:59 PM, Hong wrote:
>> Danyang :
>> Further testing a_flow_check_168.bin,
>> ./ex10 -f0
>> /Users/Hong/Downloads/matrix_and_rhs_bin/a_flow_check_168.bin
>> -rhs
>> /Users/Hong/Downloads/matrix_and_rhs_bin/x_flow_check_168.bin
>> -pc_type lu -pc_factor_mat_solver_package superlu
>> -ksp_monitor_true_residual -mat_superlu_conditionnumber
>> Recip. condition number = 1.610480e-12
>> 0 KSP preconditioned resid norm 6.873340313547e+09 true resid
>> norm 7.295020990196e+03 ||r(i)||/||b|| 1.000000000000e+00
>> 1 KSP preconditioned resid norm 2.051833296449e-02 true resid
>> norm 2.976859070118e-02 ||r(i)||/||b|| 4.080672384793e-06
>> Number of iterations = 1
>> Residual norm 0.0297686
>>
>> condition number of this matrix = 1/1.610480e-12 = 1.e+12,
>> i.e., this matrix is ill-conditioned.
>>
>> Hong
>>
>>
>> Hi Hong,
>>
>> The binary format of matrix, rhs and solution can be
>> downloaded via the link below.
>>
>> https://www.dropbox.com/s/cl3gfi0s0kjlktf/matrix_and_rhs_bin.tar.gz?dl=0
>>
>> Thanks,
>>
>> Danyang
>>
>>
>> On 15-12-03 10:50 AM, Hong wrote:
>>> Danyang:
>>>
>>>
>>>
>>> To my surprising, solutions from SuperLU at timestep 29
>>> seems not correct for the first 4 Newton iterations, but
>>> the solutions from iteration solver and MUMPS are correct.
>>>
>>> Please find all the matrices, rhs and solutions at
>>> timestep 29 via the link below. The data is a bit large
>>> so that I just share it through Dropbox. A piece of
>>> matlab code to read these data and then computer the
>>> norm has also been attached.
>>> _https://www.dropbox.com/s/rr8ueysgflmxs7h/results-check.tar.gz?dl=0_
>>>
>>>
>>> Can you send us matrix in petsc binary format?
>>>
>>> e.g., call MatView(M, PETSC_VIEWER_BINARY_(PETSC_COMM_WORLD))
>>> or '-ksp_view_mat binary'
>>>
>>> Hong
>>>
>>>
>>>
>>> Below is a summary of the norm from the three solvers at
>>> timestep 29, newton iteration 1 to 5.
>>>
>>> Timestep 29
>>> Norm of residual seq 1.661321e-09, superlu 1.657103e+04,
>>> mumps 3.731225e-11
>>> Norm of residual seq 1.753079e-09, superlu 6.675467e+02,
>>> mumps 1.509919e-13
>>> Norm of residual seq 4.914971e-10, superlu 1.236362e-01,
>>> mumps 2.139303e-17
>>> Norm of residual seq 3.532769e-10, superlu 1.304670e-04,
>>> mumps 5.387000e-20
>>> Norm of residual seq 3.885629e-10, superlu 2.754876e-07,
>>> mumps 4.108675e-21
>>>
>>> Would anybody please check if SuperLU can solve these
>>> matrices? Another possibility is that something is wrong
>>> in my own code. But so far, I cannot find any problem in
>>> my code since the same code works fine if I using
>>> iterative solver or direct solver MUMPS. But for other
>>> cases I have tested, all these solvers work fine.
>>>
>>> Please let me know if I did not write down the problem
>>> clearly.
>>>
>>> Thanks,
>>>
>>> Danyang
>>>
>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151207/81c75b6b/attachment-0001.html>
More information about the petsc-users
mailing list