[petsc-users] Tough to reproduce petsctablefind error

Chris Hewson chris at resfrac.com
Mon Jul 20 14:25:26 CDT 2020


Chris is using Haswell, what MPI are you using? I trust you are not using
Moose.
- yes, using haswell, mpi is mpich v3.3a2 on ubuntu 18.04. I am not using
MOOSE.

*Chris Hewson*
Senior Reservoir Simulation Engineer
ResFrac
+1.587.575.9792


On Mon, Jul 20, 2020 at 1:14 PM Mark Adams <mfadams at lbl.gov> wrote:

> This is indeed a nasty bug, but having two separate should be useful.
>
> Chris is using Haswell, what MPI are you using? I trust you are not using
> Moose.
>
> Fande what machine/MPI are you using?
>
> On Mon, Jul 20, 2020 at 3:04 PM Chris Hewson <chris at resfrac.com> wrote:
>
>> Hi Mark,
>>
>> Chris: It sounds like you just have one matrix that you give to MUMPS.
>> You seem to be creating a matrix in the middle of your run. Are you doing
>> dynamic adaptivity?
>> - I have 2 separate matrices I give to mumps, but as this is happening in
>> the production build of my code, I can't determine with certainty what call
>> to MUMPS it's happening or what call to KSPBCGS or UMFPACK it's happening
>> in.
>>
>> I do destroy and recreate matrices in the middle of my runs, but this
>> happens multiple times before the fault happens and in (presumably) the
>> same way. I also do checks on matrix sizes and what I am sending to PETSc
>> and those all pass, just at some point there are size mismatches
>> somewhere, understandably this is not a lot to go on. I am not doing
>> dynamic adaptivity, the mesh is instead changing its size.
>>
>> And I agree with Fande, the most frustrating part is that it's not
>> reproducible, but yah not 100% sure that the problem lies within the PETSc
>> code base either.
>>
>> Current working theories are:
>> 1. Some sort of MPI problem with the sending of one the matrix elements
>> (using mpich version 3.3a2)
>> 2. Some of the memory of static pointers gets corrupted, although I would
>> expect a garbage number and not something that could possibly make sense.
>>
>> *Chris Hewson*
>> Senior Reservoir Simulation Engineer
>> ResFrac
>> +1.587.575.9792
>>
>>
>> On Mon, Jul 20, 2020 at 12:41 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>>
>>>
>>> On Mon, Jul 20, 2020 at 2:36 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> Just to be clear, I do not think it is related to GAMG or PtAP. It is a
>>>> communication issue:
>>>>
>>>
>>> Youe stack trace was from PtAP, but Chris's problem is not.
>>>
>>>
>>>>
>>>> Reran the same code, and I just got :
>>>>
>>>> [252]PETSC ERROR: --------------------- Error Message
>>>> --------------------------------------------------------------
>>>> [252]PETSC ERROR: Petsc has generated inconsistent data
>>>> [252]PETSC ERROR: Received vector entry 4469094877509280860 out of
>>>> local range [255426072,256718616)]
>>>>
>>>
>>> OK, now this (4469094877509280860) is clearly garbage. THat is the
>>> important thing.  I have to think your MPI is buggy.
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200720/4cc1841f/attachment-0001.html>


More information about the petsc-users mailing list