[petsc-users] SuperLU_dist issue in 3.7.4

Zhang, Hong hzhang at mcs.anl.gov
Fri Oct 21 21:28:51 CDT 2016


It is not problem with Matload twice. The file has one matrix, but is loaded twice.

Replacing pc with ksp, the code runs fine. 
The error occurs when PCSetUp_LU() is called with SAME_NONZERO_PATTERN.
I'll further look at it later.

Hong
________________________________________
From: Zhang, Hong
Sent: Friday, October 21, 2016 8:18 PM
To: Barry Smith; petsc-users
Subject: RE: [petsc-users] SuperLU_dist issue in 3.7.4

I am investigating it. The file has two matrices. The code takes following steps:

PCCreate(PETSC_COMM_WORLD, &pc);

MatCreate(PETSC_COMM_WORLD,&A);
MatLoad(A,fd);
PCSetOperators(pc,A,A);
PCSetUp(pc);

MatCreate(PETSC_COMM_WORLD,&A);
MatLoad(A,fd);
PCSetOperators(pc,A,A);
PCSetUp(pc);  //crash here with np=2, superlu_dist, not with mumps/superlu or superlu_dist np=1

Hong

________________________________________
From: Barry Smith [bsmith at mcs.anl.gov]
Sent: Friday, October 21, 2016 5:59 PM
To: petsc-users
Cc: Zhang, Hong
Subject: Re: [petsc-users] SuperLU_dist issue in 3.7.4

> On Oct 21, 2016, at 5:16 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>
> The issue with this test code is - using MatLoad() twice [with the
> same object - without destroying it]. Not sure if thats supporsed to
> work..

   If the file has two matrices in it then yes a second call to MatLoad() with the same matrix should just load in the second matrix from the file correctly. Perhaps we need a test in our test suite just to make sure that works.

  Barry



>
> Satish
>
> On Fri, 21 Oct 2016, Hong wrote:
>
>> I can reproduce the error on a linux machine with petsc-maint. It crashes
>> at 2nd solve, on both processors:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x00007f051dc835bd in pdgsequ (A=0x1563910, r=0x176dfe0, c=0x178f7f0,
>>    rowcnd=0x7fffcb8dab30, colcnd=0x7fffcb8dab38, amax=0x7fffcb8dab40,
>>    info=0x7fffcb8dab4c, grid=0x1563858)
>>    at
>> /sandbox/hzhang/petsc/arch-linux-gcc-gfortran/externalpackages/git.superlu_dist/SRC/pdgsequ.c:182
>> 182                 c[jcol] = SUPERLU_MAX( c[jcol], fabs(Aval[j]) * r[irow]
>> );
>>
>> The version of superlu_dist:
>> commit 0b5369f304507f1c7904a913f4c0c86777a60639
>> Author: Xiaoye Li <xsli at lbl.gov>
>> Date:   Thu May 26 11:33:19 2016 -0700
>>
>>    rename 'struct pair' to 'struct superlu_pair'.
>>
>> Hong
>>
>> On Fri, Oct 21, 2016 at 5:36 AM, Anton Popov <popov at uni-mainz.de> wrote:
>>
>>>
>>> On 10/19/2016 05:22 PM, Anton Popov wrote:
>>>
>>> I looked at each valgrind-complained item in your email dated Oct. 11.
>>> Those reports are really superficial; I don't see anything  wrong with
>>> those lines (mostly uninitialized variables) singled out.  I did a few
>>> tests with the latest version in github,  all went fine.
>>>
>>> Perhaps you can print your matrix that caused problem, I can run it using
>>> your matrix.
>>>
>>> Sherry
>>>
>>> Hi Sherry,
>>>
>>> I finally figured out a minimalistic setup (attached) that reproduces the
>>> problem.
>>>
>>> I use petsc-maint:
>>>
>>> git clone -b maint https://bitbucket.org/petsc/petsc.git
>>>
>>> and configure it in the debug mode without optimization using the options:
>>>
>>> --download-superlu_dist=1 \
>>> --download-superlu_dist-commit=origin/maint \
>>>
>>> Compile the test, assuming PETSC_DIR points to the described petsc
>>> installation:
>>>
>>> make ex16
>>>
>>> Run with:
>>>
>>> mpirun -n 2 ./ex16 -f binaryoutput -pc_type lu
>>> -pc_factor_mat_solver_package superlu_dist
>>>
>>> Matrix partitioning between the processors will be completely the same as
>>> in our code (hard-coded).
>>>
>>> I factorize the same matrix twice with the same PC object. Remarkably it
>>> runs fine for the first time, but fails for the second.
>>>
>>> Thank you very much for looking into this problem.
>>>
>>> Cheers,
>>> Anton
>>>
>>
>



More information about the petsc-users mailing list