[petsc-users] SuperLU_dist issue in 3.7.4

Anton Popov popov at uni-mainz.de
Mon Nov 7 10:52:22 CST 2016



On 10/27/2016 04:51 PM, Hong wrote:
> Sherry,
> Thanks for detailed explanation.
> We use options.Fact = DOFACT as default for the first factorization. 
> When user reuses matrix factor, then we must provide a default,
> either 'options.Fact = SamePattern' or 'SamePattern_SameRowPerm'.
> We previously set 'SamePattern_SameRowPerm'. After a user reported 
> error, we switched to 'SamePattern' which causes problem for 2nd user.
Hong,

Setting Options.Fact = DOFACT for all factorizations is currently 
impossible via PETSc interface.
The user is expected to choose some kind of reuse model.
If you could add it, I (and other users probably too) would really 
appreciate that.

Thanks a lot,
Anton

>
> I'll check our interface to see if we can add flag-checking for Pr and 
> Pc, then set default accordingly.
>
> Hong
>
> On Wed, Oct 26, 2016 at 3:23 PM, Xiaoye S. Li <xsli at lbl.gov 
> <mailto:xsli at lbl.gov>> wrote:
>
>     Some graph preprocessing steps can be skipped ONLY IF a previous
>     factorization was done, and the information can be reused (AS
>     INPUT) to the new factorization.
>
>     In general, the driver routine SRC/pdgssvx.c() performs the LU
>     factorization of the following (preprocessed) matrix:
>      Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U
>
>     The default is to do LU from scratch, including all the steps to
>     compute equilibration (R, C), pivot ordering (Pr), and sparsity
>     ordering (Pc).
>
>     -- The default should be set as options.Fact = DOFACT.
>
>     -- When you set options.Fact = SamePattern, the sparsity ordering
>     step is skipped, but you need to input Pc which was obtained from
>     a previous factorization.
>
>     -- When you set options.Fact = SamePattern_SameRowPerm, both
>     sparsity reordering and pivoting ordering steps are skipped, but
>     you need to input both Pr and Pc.
>
>     Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details,
>     regarding which data structures should be inputs and which are
>     outputs.  The Users Guide also explains this.
>
>     In EXAMPLE/ directory, I have various examples of these usage
>     situations, see EXAMPLE/README.
>
>     I am a little puzzled why in PETSc, the default is set to
>     SamePattern ??
>
>     Sherry
>
>
>     On Tue, Oct 25, 2016 at 9:18 AM, Hong <hzhang at mcs.anl.gov
>     <mailto:hzhang at mcs.anl.gov>> wrote:
>
>         Sherry,
>
>         We set '-mat_superlu_dist_fact SamePattern'  as default in
>         petsc/superlu_dist on 12/6/15 (see attached email below).
>
>         However, Anton must set 'SamePattern_SameRowPerm' to avoid
>         crash in his code. Checking
>         http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html
>         <http://crd-legacy.lbl.gov/%7Exiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html>
>         I see detailed description on using SamePattern_SameRowPerm,
>         which requires more from user than SamePattern. I guess these
>         flags are used for efficiency. The library sets a default,
>         then have users to switch for their own applications. The
>         default setting should not cause crash. If crash occurs, give
>         a meaningful error message would be help.
>
>         Do you have suggestion how should we set default in petsc for
>         this flag?
>
>         Hong
>
>         -------------------
>
>
>               Hong <hzhang at mcs.anl.gov <mailto:hzhang at mcs.anl.gov>>
>
>         	
>         12/7/15
>         	
>         	
>         to Danyang, petsc-maint, PETSc, Xiaoye
>
>         Danyang :
>
>         Adding '-mat_superlu_dist_fact SamePattern' fixed the problem.
>         Below is how I figured it out.
>
>         1. Reading ex52f.F, I see '-superlu_default' =
>         '-pc_factor_mat_solver_package superlu_dist', the later
>         enables runtime options for other packages. I use
>         superlu_dist-4.2 and superlu-4.1 for the tests below.
>         ...
>         5.
>         Using a_flow_check_1.bin, I am able to reproduce the error you
>         reported: all packages give correct results except superlu_dist:
>         ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
>         matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices
>         flow_check -loop_folder matrix_and_rhs_bin -pc_type lu
>         -pc_factor_mat_solver_package superlu_dist
>         Norm of error  2.5970E-12 iterations     1
>          -->Test for matrix          168
>         Norm of error  1.3936E-01 iterations    34
>          -->Test for matrix          169
>
>         I guess the error might come from reuse of matrix factor.
>         Replacing default
>         -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
>         -mat_superlu_dist_fact SamePattern, I get
>
>         ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
>         matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices
>         flow_check -loop_folder matrix_and_rhs_bin -pc_type lu
>         -pc_factor_mat_solver_package superlu_dist
>         -mat_superlu_dist_fact SamePattern
>
>         Norm of error  2.5970E-12 iterations     1
>          -->Test for matrix          168
>         ...
>         Sherry may tell you why SamePattern_SameRowPerm cause the
>         difference here.
>         Best on the above experiments, I would set following as default
>         '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
>         '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist
>         interface.
>
>         Hong
>
>         On Tue, Oct 25, 2016 at 10:38 AM, Hong <hzhang at mcs.anl.gov
>         <mailto:hzhang at mcs.anl.gov>> wrote:
>
>             Anton,
>             I guess, when you reuse matrix and its symbolic factor
>             with updated numerical values, superlu_dist requires this
>             option. I'm cc'ing Sherry to confirm it.
>
>             I'll check petsc/superlu-dist interface to set this flag
>             for this case.
>
>             Hong
>
>
>             On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov
>             <popov at uni-mainz.de <mailto:popov at uni-mainz.de>> wrote:
>
>                 Hong,
>
>                 I get all the problems gone and valgrind-clean output
>                 if I specify this:
>
>                 -mat_superlu_dist_fact SamePattern_SameRowPerm
>
>                 What does SamePattern_SameRowPerm actually mean?
>                 Row permutations are for large diagonal, column
>                 permutations are for sparsity, right?
>                 Will it skip subsequent matrix permutations for large
>                 diagonal even if matrix values change significantly?
>
>                 Surprisingly everything works even with:
>
>                 -mat_superlu_dist_colperm PARMETIS
>                 -mat_superlu_dist_parsymbfact TRUE
>
>                 Thanks,
>                 Anton
>
>                 On 10/24/2016 09:06 PM, Hong wrote:
>>                 Anton:
>>
>>>                     If replacing superlu_dist with mumps, does your
>>>                     code work?
>>                     yes
>>
>>                 You may use mumps in your code, or tests different
>>                 options for superlu_dist:
>>
>>                 -mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None)
>>                 -mat_superlu_dist_rowperm <LargeDiag> Row permutation
>>                 (choose one of) LargeDiag NATURAL (None)
>>                 -mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column
>>                 permutation (choose one of) NATURAL MMD_AT_PLUS_A
>>                 MMD_ATA METIS_AT_PLUS_A PARMETIS (None)
>>                 -mat_superlu_dist_replacetinypivot: <FALSE> Replace
>>                 tiny pivots (None)
>>                 -mat_superlu_dist_parsymbfact: <FALSE> Parallel
>>                 symbolic factorization (None)
>>                 -mat_superlu_dist_fact <SamePattern> Sparsity pattern
>>                 for repeated matrix factorization (choose one of)
>>                 SamePattern SamePattern_SameRowPerm (None)
>>
>>                 The options inside <> are defaults. You may try
>>                 others. This might help narrow down the bug.
>>
>>                 Hong
>>
>>
>>>                     Hong
>>>
>>>                         On 10/24/2016 05:47 PM, Hong wrote:
>>>>                         Barry,
>>>>                         Your change indeed fixed the error of his
>>>>                         testing code.
>>>>                         As Satish tested, on your branch, ex16 runs
>>>>                         smooth.
>>>>
>>>>                         I do not understand why on maint or master
>>>>                         branch, ex16 creases inside superlu_dist,
>>>>                         but not with mumps.
>>>>
>>>
>>>                         I also confirm that ex16 runs fine with
>>>                         latest fix, but unfortunately not my code.
>>>
>>>                         This is something to be expected, since my
>>>                         code preallocates once in the beginning. So
>>>                         there is no way it can be affected by
>>>                         multiple preallocations. Subsequently I only
>>>                         do matrix assembly, that makes sure
>>>                         structure doesn't change (set to get error
>>>                         otherwise).
>>>
>>>                         Summary: we don't have a simple test code to
>>>                         debug superlu issue anymore.
>>>
>>>                         Anton
>>>
>>>>                         Hong
>>>>
>>>>                         On Mon, Oct 24, 2016 at 9:34 AM, Satish
>>>>                         Balay <balay at mcs.anl.gov
>>>>                         <mailto:balay at mcs.anl.gov>> wrote:
>>>>
>>>>                             On Mon, 24 Oct 2016, Barry Smith wrote:
>>>>
>>>>                             >
>>>>                             > > [Or perhaps Hong is using a
>>>>                             different test code and is observing bugs
>>>>                             > > with superlu_dist interface..]
>>>>                             >
>>>>                             >    She states that her test does a
>>>>                             NEW MatCreate() for each matrix load (I
>>>>                             cut and pasted it in the email I just
>>>>                             sent). The bug I fixed was only related
>>>>                             to using the SAME matrix from one
>>>>                             MatLoad() in another MatLoad().
>>>>
>>>>                             Ah - ok.. Sorry - wasn't thinking
>>>>                             clearly :(
>>>>
>>>>                             Satish
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161107/0575f366/attachment-0001.html>


More information about the petsc-users mailing list