[petsc-users] SuperLU_dist issue in 3.7.4
Anton Popov
popov at uni-mainz.de
Mon Nov 7 10:52:22 CST 2016
On 10/27/2016 04:51 PM, Hong wrote:
> Sherry,
> Thanks for detailed explanation.
> We use options.Fact = DOFACT as default for the first factorization.
> When user reuses matrix factor, then we must provide a default,
> either 'options.Fact = SamePattern' or 'SamePattern_SameRowPerm'.
> We previously set 'SamePattern_SameRowPerm'. After a user reported
> error, we switched to 'SamePattern' which causes problem for 2nd user.
Hong,
Setting Options.Fact = DOFACT for all factorizations is currently
impossible via PETSc interface.
The user is expected to choose some kind of reuse model.
If you could add it, I (and other users probably too) would really
appreciate that.
Thanks a lot,
Anton
>
> I'll check our interface to see if we can add flag-checking for Pr and
> Pc, then set default accordingly.
>
> Hong
>
> On Wed, Oct 26, 2016 at 3:23 PM, Xiaoye S. Li <xsli at lbl.gov
> <mailto:xsli at lbl.gov>> wrote:
>
> Some graph preprocessing steps can be skipped ONLY IF a previous
> factorization was done, and the information can be reused (AS
> INPUT) to the new factorization.
>
> In general, the driver routine SRC/pdgssvx.c() performs the LU
> factorization of the following (preprocessed) matrix:
> Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U
>
> The default is to do LU from scratch, including all the steps to
> compute equilibration (R, C), pivot ordering (Pr), and sparsity
> ordering (Pc).
>
> -- The default should be set as options.Fact = DOFACT.
>
> -- When you set options.Fact = SamePattern, the sparsity ordering
> step is skipped, but you need to input Pc which was obtained from
> a previous factorization.
>
> -- When you set options.Fact = SamePattern_SameRowPerm, both
> sparsity reordering and pivoting ordering steps are skipped, but
> you need to input both Pr and Pc.
>
> Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details,
> regarding which data structures should be inputs and which are
> outputs. The Users Guide also explains this.
>
> In EXAMPLE/ directory, I have various examples of these usage
> situations, see EXAMPLE/README.
>
> I am a little puzzled why in PETSc, the default is set to
> SamePattern ??
>
> Sherry
>
>
> On Tue, Oct 25, 2016 at 9:18 AM, Hong <hzhang at mcs.anl.gov
> <mailto:hzhang at mcs.anl.gov>> wrote:
>
> Sherry,
>
> We set '-mat_superlu_dist_fact SamePattern' as default in
> petsc/superlu_dist on 12/6/15 (see attached email below).
>
> However, Anton must set 'SamePattern_SameRowPerm' to avoid
> crash in his code. Checking
> http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html
> <http://crd-legacy.lbl.gov/%7Exiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html>
> I see detailed description on using SamePattern_SameRowPerm,
> which requires more from user than SamePattern. I guess these
> flags are used for efficiency. The library sets a default,
> then have users to switch for their own applications. The
> default setting should not cause crash. If crash occurs, give
> a meaningful error message would be help.
>
> Do you have suggestion how should we set default in petsc for
> this flag?
>
> Hong
>
> -------------------
>
>
> Hong <hzhang at mcs.anl.gov <mailto:hzhang at mcs.anl.gov>>
>
>
> 12/7/15
>
>
> to Danyang, petsc-maint, PETSc, Xiaoye
>
> Danyang :
>
> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem.
> Below is how I figured it out.
>
> 1. Reading ex52f.F, I see '-superlu_default' =
> '-pc_factor_mat_solver_package superlu_dist', the later
> enables runtime options for other packages. I use
> superlu_dist-4.2 and superlu-4.1 for the tests below.
> ...
> 5.
> Using a_flow_check_1.bin, I am able to reproduce the error you
> reported: all packages give correct results except superlu_dist:
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices
> flow_check -loop_folder matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package superlu_dist
> Norm of error 2.5970E-12 iterations 1
> -->Test for matrix 168
> Norm of error 1.3936E-01 iterations 34
> -->Test for matrix 169
>
> I guess the error might come from reuse of matrix factor.
> Replacing default
> -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
> -mat_superlu_dist_fact SamePattern, I get
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices
> flow_check -loop_folder matrix_and_rhs_bin -pc_type lu
> -pc_factor_mat_solver_package superlu_dist
> -mat_superlu_dist_fact SamePattern
>
> Norm of error 2.5970E-12 iterations 1
> -->Test for matrix 168
> ...
> Sherry may tell you why SamePattern_SameRowPerm cause the
> difference here.
> Best on the above experiments, I would set following as default
> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist
> interface.
>
> Hong
>
> On Tue, Oct 25, 2016 at 10:38 AM, Hong <hzhang at mcs.anl.gov
> <mailto:hzhang at mcs.anl.gov>> wrote:
>
> Anton,
> I guess, when you reuse matrix and its symbolic factor
> with updated numerical values, superlu_dist requires this
> option. I'm cc'ing Sherry to confirm it.
>
> I'll check petsc/superlu-dist interface to set this flag
> for this case.
>
> Hong
>
>
> On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov
> <popov at uni-mainz.de <mailto:popov at uni-mainz.de>> wrote:
>
> Hong,
>
> I get all the problems gone and valgrind-clean output
> if I specify this:
>
> -mat_superlu_dist_fact SamePattern_SameRowPerm
>
> What does SamePattern_SameRowPerm actually mean?
> Row permutations are for large diagonal, column
> permutations are for sparsity, right?
> Will it skip subsequent matrix permutations for large
> diagonal even if matrix values change significantly?
>
> Surprisingly everything works even with:
>
> -mat_superlu_dist_colperm PARMETIS
> -mat_superlu_dist_parsymbfact TRUE
>
> Thanks,
> Anton
>
> On 10/24/2016 09:06 PM, Hong wrote:
>> Anton:
>>
>>> If replacing superlu_dist with mumps, does your
>>> code work?
>> yes
>>
>> You may use mumps in your code, or tests different
>> options for superlu_dist:
>>
>> -mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None)
>> -mat_superlu_dist_rowperm <LargeDiag> Row permutation
>> (choose one of) LargeDiag NATURAL (None)
>> -mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column
>> permutation (choose one of) NATURAL MMD_AT_PLUS_A
>> MMD_ATA METIS_AT_PLUS_A PARMETIS (None)
>> -mat_superlu_dist_replacetinypivot: <FALSE> Replace
>> tiny pivots (None)
>> -mat_superlu_dist_parsymbfact: <FALSE> Parallel
>> symbolic factorization (None)
>> -mat_superlu_dist_fact <SamePattern> Sparsity pattern
>> for repeated matrix factorization (choose one of)
>> SamePattern SamePattern_SameRowPerm (None)
>>
>> The options inside <> are defaults. You may try
>> others. This might help narrow down the bug.
>>
>> Hong
>>
>>
>>> Hong
>>>
>>> On 10/24/2016 05:47 PM, Hong wrote:
>>>> Barry,
>>>> Your change indeed fixed the error of his
>>>> testing code.
>>>> As Satish tested, on your branch, ex16 runs
>>>> smooth.
>>>>
>>>> I do not understand why on maint or master
>>>> branch, ex16 creases inside superlu_dist,
>>>> but not with mumps.
>>>>
>>>
>>> I also confirm that ex16 runs fine with
>>> latest fix, but unfortunately not my code.
>>>
>>> This is something to be expected, since my
>>> code preallocates once in the beginning. So
>>> there is no way it can be affected by
>>> multiple preallocations. Subsequently I only
>>> do matrix assembly, that makes sure
>>> structure doesn't change (set to get error
>>> otherwise).
>>>
>>> Summary: we don't have a simple test code to
>>> debug superlu issue anymore.
>>>
>>> Anton
>>>
>>>> Hong
>>>>
>>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish
>>>> Balay <balay at mcs.anl.gov
>>>> <mailto:balay at mcs.anl.gov>> wrote:
>>>>
>>>> On Mon, 24 Oct 2016, Barry Smith wrote:
>>>>
>>>> >
>>>> > > [Or perhaps Hong is using a
>>>> different test code and is observing bugs
>>>> > > with superlu_dist interface..]
>>>> >
>>>> > She states that her test does a
>>>> NEW MatCreate() for each matrix load (I
>>>> cut and pasted it in the email I just
>>>> sent). The bug I fixed was only related
>>>> to using the SAME matrix from one
>>>> MatLoad() in another MatLoad().
>>>>
>>>> Ah - ok.. Sorry - wasn't thinking
>>>> clearly :(
>>>>
>>>> Satish
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161107/0575f366/attachment-0001.html>
More information about the petsc-users
mailing list