[petsc-users] SuperLU_dist issue in 3.7.4

Hong hzhang at mcs.anl.gov
Thu Oct 27 09:51:21 CDT 2016


Sherry,
Thanks for detailed explanation.
We use options.Fact = DOFACT as default for the first factorization. When
user reuses matrix factor, then we must provide a default,
either 'options.Fact = SamePattern' or 'SamePattern_SameRowPerm'.
We previously set 'SamePattern_SameRowPerm'. After a user reported error,
we switched to 'SamePattern' which causes problem for 2nd user.

I'll check our interface to see if we can add flag-checking for Pr and Pc,
then set default accordingly.

Hong

On Wed, Oct 26, 2016 at 3:23 PM, Xiaoye S. Li <xsli at lbl.gov> wrote:

> Some graph preprocessing steps can be skipped ONLY IF a previous
> factorization was done, and the information can be reused (AS INPUT) to the
> new factorization.
>
> In general, the driver routine SRC/pdgssvx.c() performs the LU
> factorization of the following (preprocessed) matrix:
>  Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U
>
> The default is to do LU from scratch, including all the steps to compute
> equilibration (R, C), pivot ordering (Pr), and sparsity ordering (Pc).
>
> -- The default should be set as options.Fact = DOFACT.
>
> -- When you set options.Fact = SamePattern, the sparsity ordering step is
> skipped, but you need to input Pc which was obtained from a previous
> factorization.
>
> -- When you set options.Fact = SamePattern_SameRowPerm, both sparsity
> reordering and pivoting ordering steps are skipped, but you need to input
> both Pr and Pc.
>
> Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details,
> regarding which data structures should be inputs and which are outputs.
> The Users Guide also explains this.
>
> In EXAMPLE/ directory, I have various examples of these usage situations,
> see EXAMPLE/README.
>
> I am a little puzzled why in PETSc, the default is set to SamePattern ??
>
> Sherry
>
>
> On Tue, Oct 25, 2016 at 9:18 AM, Hong <hzhang at mcs.anl.gov> wrote:
>
>> Sherry,
>>
>> We set '-mat_superlu_dist_fact SamePattern'  as default in
>> petsc/superlu_dist on 12/6/15 (see attached email below).
>>
>> However, Anton must set 'SamePattern_SameRowPerm' to avoid crash in his
>> code. Checking
>> http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_
>> html/pzgssvx___a_bglobal_8c.html
>> I see detailed description on using SamePattern_SameRowPerm, which
>> requires more from user than SamePattern. I guess these flags are used
>> for efficiency. The library sets a default, then have users to switch for
>> their own applications. The default setting should not cause crash. If
>> crash occurs, give a meaningful error message would be help.
>>
>> Do you have suggestion how should we set default in petsc for this flag?
>>
>> Hong
>>
>> -------------------
>> Hong <hzhang at mcs.anl.gov>
>> 12/7/15
>> to Danyang, petsc-maint, PETSc, Xiaoye
>> Danyang :
>>
>> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is
>> how I figured it out.
>>
>> 1. Reading ex52f.F, I see '-superlu_default' =
>> '-pc_factor_mat_solver_package superlu_dist', the later enables runtime
>> options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the
>> tests below.
>> ...
>> 5.
>> Using a_flow_check_1.bin, I am able to reproduce the error you reported:
>> all packages give correct results except superlu_dist:
>> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
>> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
>> superlu_dist
>> Norm of error  2.5970E-12 iterations     1
>>  -->Test for matrix          168
>> Norm of error  1.3936E-01 iterations    34
>>  -->Test for matrix          169
>>
>> I guess the error might come from reuse of matrix factor. Replacing
>> default
>> -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
>> -mat_superlu_dist_fact SamePattern, I get
>>
>> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
>> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
>> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
>> superlu_dist -mat_superlu_dist_fact SamePattern
>>
>> Norm of error  2.5970E-12 iterations     1
>>  -->Test for matrix          168
>> ...
>> Sherry may tell you why SamePattern_SameRowPerm cause the difference
>> here.
>> Best on the above experiments, I would set following as default
>> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
>> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface.
>>
>> Hong
>>
>> On Tue, Oct 25, 2016 at 10:38 AM, Hong <hzhang at mcs.anl.gov> wrote:
>>
>>> Anton,
>>> I guess, when you reuse matrix and its symbolic factor with updated
>>> numerical values, superlu_dist requires this option. I'm cc'ing Sherry to
>>> confirm it.
>>>
>>> I'll check petsc/superlu-dist interface to set this flag for this case.
>>>
>>> Hong
>>>
>>>
>>> On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov <popov at uni-mainz.de> wrote:
>>>
>>>> Hong,
>>>>
>>>> I get all the problems gone and valgrind-clean output if I specify this:
>>>>
>>>> -mat_superlu_dist_fact SamePattern_SameRowPerm
>>>> What does SamePattern_SameRowPerm actually mean?
>>>> Row permutations are for large diagonal, column permutations are for
>>>> sparsity, right?
>>>> Will it skip subsequent matrix permutations for large diagonal even if
>>>> matrix values change significantly?
>>>>
>>>> Surprisingly everything works even with:
>>>>
>>>> -mat_superlu_dist_colperm PARMETIS
>>>> -mat_superlu_dist_parsymbfact TRUE
>>>>
>>>> Thanks,
>>>> Anton
>>>>
>>>> On 10/24/2016 09:06 PM, Hong wrote:
>>>>
>>>> Anton:
>>>>>
>>>>> If replacing superlu_dist with mumps, does your code work?
>>>>>
>>>>> yes
>>>>>
>>>>
>>>> You may use mumps in your code, or tests different options for
>>>> superlu_dist:
>>>>
>>>>   -mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None)
>>>>   -mat_superlu_dist_rowperm <LargeDiag> Row permutation (choose one of)
>>>> LargeDiag NATURAL (None)
>>>>   -mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column permutation
>>>> (choose one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS
>>>> (None)
>>>>   -mat_superlu_dist_replacetinypivot: <FALSE> Replace tiny pivots
>>>> (None)
>>>>   -mat_superlu_dist_parsymbfact: <FALSE> Parallel symbolic
>>>> factorization (None)
>>>>   -mat_superlu_dist_fact <SamePattern> Sparsity pattern for repeated
>>>> matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm
>>>> (None)
>>>>
>>>> The options inside <> are defaults. You may try others. This might help
>>>> narrow down the bug.
>>>>
>>>> Hong
>>>>
>>>>>
>>>>> Hong
>>>>>>
>>>>>> On 10/24/2016 05:47 PM, Hong wrote:
>>>>>>
>>>>>> Barry,
>>>>>> Your change indeed fixed the error of his testing code.
>>>>>> As Satish tested, on your branch, ex16 runs smooth.
>>>>>>
>>>>>> I do not understand why on maint or master branch, ex16 creases
>>>>>> inside superlu_dist, but not with mumps.
>>>>>>
>>>>>>
>>>>>> I also confirm that ex16 runs fine with latest fix, but unfortunately
>>>>>> not my code.
>>>>>>
>>>>>> This is something to be expected, since my code preallocates once in
>>>>>> the beginning. So there is no way it can be affected by multiple
>>>>>> preallocations. Subsequently I only do matrix assembly, that makes sure
>>>>>> structure doesn't change (set to get error otherwise).
>>>>>>
>>>>>> Summary: we don't have a simple test code to debug superlu issue
>>>>>> anymore.
>>>>>>
>>>>>> Anton
>>>>>>
>>>>>> Hong
>>>>>>
>>>>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay <balay at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>> On Mon, 24 Oct 2016, Barry Smith wrote:
>>>>>>>
>>>>>>> >
>>>>>>> > > [Or perhaps Hong is using a different test code and is observing
>>>>>>> bugs
>>>>>>> > > with superlu_dist interface..]
>>>>>>> >
>>>>>>> >    She states that her test does a NEW MatCreate() for each matrix
>>>>>>> load (I cut and pasted it in the email I just sent). The bug I fixed was
>>>>>>> only related to using the SAME matrix from one MatLoad() in another
>>>>>>> MatLoad().
>>>>>>>
>>>>>>> Ah - ok.. Sorry - wasn't thinking clearly :(
>>>>>>>
>>>>>>> Satish
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161027/165ebb80/attachment-0001.html>


More information about the petsc-users mailing list