[petsc-users] SuperLU_dist issue in 3.7.4

Anton Popov popov at uni-mainz.de
Tue Oct 11 08:26:15 CDT 2016


On 10/10/2016 07:11 PM, Satish Balay wrote:
> Thats from petsc-3.5
>
> Anton - please post the stack trace you get with  --download-superlu_dist-commit=origin/maint

I guess this is it:

[0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 
/home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282 
/home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 
/home/anton/LIB/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: [0] PCSetUp_LU line 101 
/home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c
[0]PETSC ERROR: [0] PCSetUp line 930 
/home/anton/LIB/petsc/src/ksp/pc/interface/precon.c

According to the line numbers it crashes within 
MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx.

Surprisingly this only happens on the second SNES iteration, but not on 
the first.

I'm trying to reproduce this behavior with PETSc KSP and SNES examples. 
However, everything I've tried up to now with SuperLU_DIST does just fine.

I'm also checking our code in Valgrind to make sure it's clean.

Anton
>
> Satish
>
>
> On Mon, 10 Oct 2016, Xiaoye S. Li wrote:
>
>> Which version of superlu_dist does this capture?   I looked at the original
>> error  log, it pointed to pdgssvx: line 161.  But that line is in comment
>> block, not the program.
>>
>> Sherry
>>
>>
>> On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov <popov at uni-mainz.de> wrote:
>>
>>>
>>> On 10/07/2016 05:23 PM, Satish Balay wrote:
>>>
>>>> On Fri, 7 Oct 2016, Kong, Fande wrote:
>>>>
>>>> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay <balay at mcs.anl.gov> wrote:
>>>>> On Fri, 7 Oct 2016, Anton Popov wrote:
>>>>>> Hi guys,
>>>>>>> are there any news about fixing buggy behavior of SuperLU_DIST, exactly
>>>>>>>
>>>>>> what
>>>>>>
>>>>>>> is described here:
>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.
>>>>>>>
>>>>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm
>>>>>> l&d=CwIBAg&c=
>>>>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_
>>>>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG
>>>>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e=  ?
>>>>>>
>>>>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works
>>>>>>>
>>>>>> fine
>>>>>>
>>>>>>> with 3.5.4.
>>>>>>>
>>>>>>> Do I still have to stick to maint branch, and what are the chances for
>>>>>>>
>>>>>> these
>>>>>>
>>>>>>> fixes to be included in 3.7.5?
>>>>>>>
>>>>>> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing
>>>>>> issues with it - its best to debug and figure out the cause.
>>>>>>
>>>>>> This bug is indeed inside of superlu_dist, and we started having this
>>>>> issue
>>>>> from PETSc-3.6.x. I think superlu_dist developers should have fixed this
>>>>> bug. We forgot to update superlu_dist??  This is not a thing users could
>>>>> debug and fix.
>>>>>
>>>>> I have many people in INL suffering from this issue, and they have to
>>>>> stay
>>>>> with PETSc-3.5.4 to use superlu_dist.
>>>>>
>>>> To verify if the bug is fixed in latest superlu_dist - you can try
>>>> [assuming you have git - either from petsc-3.7/maint/master]:
>>>>
>>>> --download-superlu_dist --download-superlu_dist-commit=origin/maint
>>>>
>>>>
>>>> Satish
>>>>
>>>> Hi Satish,
>>> I did this:
>>>
>>> git clone -b maint https://bitbucket.org/petsc/petsc.git petsc
>>>
>>> --download-superlu_dist
>>> --download-superlu_dist-commit=origin/maint (not sure this is needed,
>>> since I'm already in maint)
>>>
>>> The problem is still there.
>>>
>>> Cheers,
>>> Anton
>>>



More information about the petsc-users mailing list