[petsc-users] SuperLU_dist issue in 3.7.4

Hong hzhang at mcs.anl.gov
Tue Oct 25 11:18:34 CDT 2016


Sherry,

We set '-mat_superlu_dist_fact SamePattern'  as default in
petsc/superlu_dist on 12/6/15 (see attached email below).

However, Anton must set 'SamePattern_SameRowPerm' to avoid crash in his
code. Checking
http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html
I see detailed description on using SamePattern_SameRowPerm, which requires
more from user than SamePattern. I guess these flags are used for
efficiency. The library sets a default, then have users to switch for their
own applications. The default setting should not cause crash. If crash
occurs, give a meaningful error message would be help.

Do you have suggestion how should we set default in petsc for this flag?

Hong

-------------------
Hong <hzhang at mcs.anl.gov>
12/7/15
to Danyang, petsc-maint, PETSc, Xiaoye
Danyang :

Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is how
I figured it out.

1. Reading ex52f.F, I see '-superlu_default' =
'-pc_factor_mat_solver_package superlu_dist', the later enables runtime
options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the
tests below.
...
5.
Using a_flow_check_1.bin, I am able to reproduce the error you reported:
all packages give correct results except superlu_dist:
./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
-loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
superlu_dist
Norm of error  2.5970E-12 iterations     1
 -->Test for matrix          168
Norm of error  1.3936E-01 iterations    34
 -->Test for matrix          169

I guess the error might come from reuse of matrix factor. Replacing default
-mat_superlu_dist_fact <SamePattern_SameRowPerm> with
-mat_superlu_dist_fact SamePattern, I get

./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
-loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
superlu_dist -mat_superlu_dist_fact SamePattern

Norm of error  2.5970E-12 iterations     1
 -->Test for matrix          168
...
Sherry may tell you why SamePattern_SameRowPerm cause the difference here.
Best on the above experiments, I would set following as default
'-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
'-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface.

Hong

On Tue, Oct 25, 2016 at 10:38 AM, Hong <hzhang at mcs.anl.gov> wrote:

> Anton,
> I guess, when you reuse matrix and its symbolic factor with updated
> numerical values, superlu_dist requires this option. I'm cc'ing Sherry to
> confirm it.
>
> I'll check petsc/superlu-dist interface to set this flag for this case.
>
> Hong
>
>
> On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov <popov at uni-mainz.de> wrote:
>
>> Hong,
>>
>> I get all the problems gone and valgrind-clean output if I specify this:
>>
>> -mat_superlu_dist_fact SamePattern_SameRowPerm
>> What does SamePattern_SameRowPerm actually mean?
>> Row permutations are for large diagonal, column permutations are for
>> sparsity, right?
>> Will it skip subsequent matrix permutations for large diagonal even if
>> matrix values change significantly?
>>
>> Surprisingly everything works even with:
>>
>> -mat_superlu_dist_colperm PARMETIS
>> -mat_superlu_dist_parsymbfact TRUE
>>
>> Thanks,
>> Anton
>>
>> On 10/24/2016 09:06 PM, Hong wrote:
>>
>> Anton:
>>>
>>> If replacing superlu_dist with mumps, does your code work?
>>>
>>> yes
>>>
>>
>> You may use mumps in your code, or tests different options for
>> superlu_dist:
>>
>>   -mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None)
>>   -mat_superlu_dist_rowperm <LargeDiag> Row permutation (choose one of)
>> LargeDiag NATURAL (None)
>>   -mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column permutation (choose
>> one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS (None)
>>   -mat_superlu_dist_replacetinypivot: <FALSE> Replace tiny pivots (None)
>>   -mat_superlu_dist_parsymbfact: <FALSE> Parallel symbolic factorization
>> (None)
>>   -mat_superlu_dist_fact <SamePattern> Sparsity pattern for repeated
>> matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm
>> (None)
>>
>> The options inside <> are defaults. You may try others. This might help
>> narrow down the bug.
>>
>> Hong
>>
>>>
>>> Hong
>>>>
>>>> On 10/24/2016 05:47 PM, Hong wrote:
>>>>
>>>> Barry,
>>>> Your change indeed fixed the error of his testing code.
>>>> As Satish tested, on your branch, ex16 runs smooth.
>>>>
>>>> I do not understand why on maint or master branch, ex16 creases inside
>>>> superlu_dist, but not with mumps.
>>>>
>>>>
>>>> I also confirm that ex16 runs fine with latest fix, but unfortunately
>>>> not my code.
>>>>
>>>> This is something to be expected, since my code preallocates once in
>>>> the beginning. So there is no way it can be affected by multiple
>>>> preallocations. Subsequently I only do matrix assembly, that makes sure
>>>> structure doesn't change (set to get error otherwise).
>>>>
>>>> Summary: we don't have a simple test code to debug superlu issue
>>>> anymore.
>>>>
>>>> Anton
>>>>
>>>> Hong
>>>>
>>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay <balay at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>> On Mon, 24 Oct 2016, Barry Smith wrote:
>>>>>
>>>>> >
>>>>> > > [Or perhaps Hong is using a different test code and is observing
>>>>> bugs
>>>>> > > with superlu_dist interface..]
>>>>> >
>>>>> >    She states that her test does a NEW MatCreate() for each matrix
>>>>> load (I cut and pasted it in the email I just sent). The bug I fixed was
>>>>> only related to using the SAME matrix from one MatLoad() in another
>>>>> MatLoad().
>>>>>
>>>>> Ah - ok.. Sorry - wasn't thinking clearly :(
>>>>>
>>>>> Satish
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161025/bc25ba55/attachment-0001.html>


More information about the petsc-users mailing list