[petsc-users] Tough to reproduce petsctablefind error

Fande Kong fdkong.jd at gmail.com
Mon Jul 20 13:36:19 CDT 2020


Hi Mark,

Just to be clear, I do not think it is related to GAMG or PtAP. It is a
communication issue:

Reran the same code, and I just got :

[252]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[252]PETSC ERROR: Petsc has generated inconsistent data
[252]PETSC ERROR: Received vector entry 4469094877509280860 out of local
range [255426072,256718616)]
[252]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[252]PETSC ERROR: Petsc Release Version 3.13.3, unknown
[252]PETSC ERROR: ../../griffin-opt on a arch-moose named r5i4n13 by kongf
Mon Jul 20 12:16:47 2020
[252]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no
--with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1
--download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1
--download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1
--with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0
--with-64-bit-indices --download-mumps=0
[252]PETSC ERROR: #1 VecAssemblyEnd_MPI_BTS() line 324 in
/home/kongf/workhome/sawtooth/moosers/petsc/src/vec/vec/impls/mpi/pbvec.c
[252]PETSC ERROR: #2 VecAssemblyEnd() line 171 in
/home/kongf/workhome/sawtooth/moosers/petsc/src/vec/vec/interface/vector.c
[cli_252]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 252


Thanks,

Fande,

On Mon, Jul 20, 2020 at 12:24 PM Mark Adams <mfadams at lbl.gov> wrote:

> OK, so this is happening in MatProductNumeric_PtAP. This must be in
> constructing the coarse grid.
>
> GAMG sort of wants to coarse at a rate of 30:1 but that needs to be
> verified. With that your index is at about the size of the first coarse
> grid. I'm trying to figure out if the index is valid. But the size of the
> max-index is 740521. This is about what I would guess is the size of the
> second coarse grid.
>
> So it kinda looks like it has a "fine" grid index in the "coarse" grid
> (2nd - 3rd coarse grids).
>
> But Chris is not using GAMG.
>
> Chris: It sounds like you just have one matrix that you give to MUMPS. You
> seem to be creating a matrix in the middle of your run. Are you doing
> dynamic adaptivity?
>
> I think we generate unique tags for each operation but it sounds like
> maybe a message is getting mixed up in some way.
>
>
>
> On Mon, Jul 20, 2020 at 12:35 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>> Hi Mark,
>>
>> Thanks for your reply.
>>
>> On Mon, Jul 20, 2020 at 7:13 AM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Fande,
>>> do you know if your 45226154 was out of range in the real  matrix?
>>>
>>
>> I do not know since it was in building the AMG hierarchy.  The size of
>> the original system is 1,428,284,880
>>
>>
>>> What size integers do you use?
>>>
>>
>> We are using 64-bit via "--with-64-bit-indices"
>>
>>
>> I am trying to catch the cause of this issue by running more simulations
>> with different configurations.
>>
>> Thanks,
>>
>> Fande,
>>
>>
>> Thanks,
>>> Mark
>>>
>>> On Mon, Jul 20, 2020 at 1:17 AM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>
>>>> Trace could look like this:
>>>>
>>>> [640]PETSC ERROR: --------------------- Error Message
>>>> --------------------------------------------------------------
>>>>
>>>> [640]PETSC ERROR: Argument out of range
>>>>
>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed
>>>> 740521
>>>>
>>>> [640]PETSC ERROR: See
>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>>> shooting.
>>>>
>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown
>>>>
>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by
>>>> wangy2 Sun Jul 19 17:14:28 2020
>>>>
>>>> [640]PETSC ERROR: Configure options --download-hypre=1
>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1
>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1
>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1
>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11
>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices
>>>> --download-mumps=0
>>>>
>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h
>>>>
>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c
>>>>
>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>>>
>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
>>>>
>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901
>>>> in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c
>>>>
>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line
>>>> 3180 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c
>>>>
>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c
>>>>
>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c
>>>>
>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
>>>>
>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
>>>>
>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c
>>>>
>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c
>>>>
>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c
>>>>
>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
>>>>
>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
>>>>
>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
>>>>
>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c
>>>>
>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in
>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c
>>>>
>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong <fdkong.jd at gmail.com>
>>>> wrote:
>>>>
>>>>> I am not entirely sure what is happening, but we encountered similar
>>>>> issues recently.  It was not reproducible. It might occur at different
>>>>> stages, and errors could be weird other than "ctable stuff." Our code was
>>>>> Valgrind clean since every PR in moose needs to go through rigorous
>>>>> Valgrind checks before it reaches the devel branch.  The errors happened
>>>>> when we used mvapich.
>>>>>
>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was
>>>>> smooth.  May you try a different MPI? It is better to try a system carried
>>>>> one.
>>>>>
>>>>> We did not get the bottom of this problem yet, but we at least know
>>>>> this is kind of MPI-related.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Fande,
>>>>>
>>>>>
>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson <chris at resfrac.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am having a bug that is occurring in PETSC with the return string:
>>>>>>
>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in
>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than
>>>>>> largest key allowed 5693
>>>>>>
>>>>>> This is using petsc-3.13.2, compiled and running using mpich with -O3
>>>>>> and debugging turned off tuned to the haswell architecture and
>>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS
>>>>>> factorization solve (I haven't been able to replicate this issue with the
>>>>>> same set of instructions etc.).
>>>>>>
>>>>>> This is a terrible way to ask a question, I know, and not very
>>>>>> helpful from your side, but this is what I have from a user's run and can't
>>>>>> reproduce on my end (either with the optimization compilation or with
>>>>>> debugging turned on). This happens when the code has run for quite some
>>>>>> time and is happening somewhat rarely.
>>>>>>
>>>>>> More than likely I am using a static variable (code is written in
>>>>>> c++) that I'm not updating when the matrix size is changing or something
>>>>>> silly like that, but any help or guidance on this would be appreciated.
>>>>>>
>>>>>> *Chris Hewson*
>>>>>> Senior Reservoir Simulation Engineer
>>>>>> ResFrac
>>>>>> +1.587.575.9792
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200720/cf8ab1e3/attachment-0001.html>


More information about the petsc-users mailing list