[petsc-users] SuperLU_dist issue in 3.7.4

Zhang, Hong hzhang at mcs.anl.gov
Fri Oct 21 20:18:40 CDT 2016


I am investigating it. The file has two matrices. The code takes following steps:

PCCreate(PETSC_COMM_WORLD, &pc);

MatCreate(PETSC_COMM_WORLD,&A);
MatLoad(A,fd); 
PCSetOperators(pc,A,A);
PCSetUp(pc);

MatCreate(PETSC_COMM_WORLD,&A);
MatLoad(A,fd); 
PCSetOperators(pc,A,A);
PCSetUp(pc);  //crash here with np=2, superlu_dist, not with mumps/superlu or superlu_dist np=1

Hong

________________________________________
From: Barry Smith [bsmith at mcs.anl.gov]
Sent: Friday, October 21, 2016 5:59 PM
To: petsc-users
Cc: Zhang, Hong
Subject: Re: [petsc-users] SuperLU_dist issue in 3.7.4

> On Oct 21, 2016, at 5:16 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>
> The issue with this test code is - using MatLoad() twice [with the
> same object - without destroying it]. Not sure if thats supporsed to
> work..

   If the file has two matrices in it then yes a second call to MatLoad() with the same matrix should just load in the second matrix from the file correctly. Perhaps we need a test in our test suite just to make sure that works.

  Barry



>
> Satish
>
> On Fri, 21 Oct 2016, Hong wrote:
>
>> I can reproduce the error on a linux machine with petsc-maint. It crashes
>> at 2nd solve, on both processors:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x00007f051dc835bd in pdgsequ (A=0x1563910, r=0x176dfe0, c=0x178f7f0,
>>    rowcnd=0x7fffcb8dab30, colcnd=0x7fffcb8dab38, amax=0x7fffcb8dab40,
>>    info=0x7fffcb8dab4c, grid=0x1563858)
>>    at
>> /sandbox/hzhang/petsc/arch-linux-gcc-gfortran/externalpackages/git.superlu_dist/SRC/pdgsequ.c:182
>> 182                 c[jcol] = SUPERLU_MAX( c[jcol], fabs(Aval[j]) * r[irow]
>> );
>>
>> The version of superlu_dist:
>> commit 0b5369f304507f1c7904a913f4c0c86777a60639
>> Author: Xiaoye Li <xsli at lbl.gov>
>> Date:   Thu May 26 11:33:19 2016 -0700
>>
>>    rename 'struct pair' to 'struct superlu_pair'.
>>
>> Hong
>>
>> On Fri, Oct 21, 2016 at 5:36 AM, Anton Popov <popov at uni-mainz.de> wrote:
>>
>>>
>>> On 10/19/2016 05:22 PM, Anton Popov wrote:
>>>
>>> I looked at each valgrind-complained item in your email dated Oct. 11.
>>> Those reports are really superficial; I don't see anything  wrong with
>>> those lines (mostly uninitialized variables) singled out.  I did a few
>>> tests with the latest version in github,  all went fine.
>>>
>>> Perhaps you can print your matrix that caused problem, I can run it using
>>> your matrix.
>>>
>>> Sherry
>>>
>>> Hi Sherry,
>>>
>>> I finally figured out a minimalistic setup (attached) that reproduces the
>>> problem.
>>>
>>> I use petsc-maint:
>>>
>>> git clone -b maint https://bitbucket.org/petsc/petsc.git
>>>
>>> and configure it in the debug mode without optimization using the options:
>>>
>>> --download-superlu_dist=1 \
>>> --download-superlu_dist-commit=origin/maint \
>>>
>>> Compile the test, assuming PETSC_DIR points to the described petsc
>>> installation:
>>>
>>> make ex16
>>>
>>> Run with:
>>>
>>> mpirun -n 2 ./ex16 -f binaryoutput -pc_type lu
>>> -pc_factor_mat_solver_package superlu_dist
>>>
>>> Matrix partitioning between the processors will be completely the same as
>>> in our code (hard-coded).
>>>
>>> I factorize the same matrix twice with the same PC object. Remarkably it
>>> runs fine for the first time, but fails for the second.
>>>
>>> Thank you very much for looking into this problem.
>>>
>>> Cheers,
>>> Anton
>>>
>>
>



More information about the petsc-users mailing list