[petsc-users] Tough to reproduce petsctablefind error

Mark Adams mfadams at lbl.gov
Mon Jul 20 13:24:05 CDT 2020


OK, so this is happening in MatProductNumeric_PtAP. This must be in
constructing the coarse grid.

GAMG sort of wants to coarse at a rate of 30:1 but that needs to be
verified. With that your index is at about the size of the first coarse
grid. I'm trying to figure out if the index is valid. But the size of the
max-index is 740521. This is about what I would guess is the size of the
second coarse grid.

So it kinda looks like it has a "fine" grid index in the "coarse" grid (2nd
- 3rd coarse grids).

But Chris is not using GAMG.

Chris: It sounds like you just have one matrix that you give to MUMPS. You
seem to be creating a matrix in the middle of your run. Are you doing
dynamic adaptivity?

I think we generate unique tags for each operation but it sounds like maybe
a message is getting mixed up in some way.



On Mon, Jul 20, 2020 at 12:35 PM Fande Kong <fdkong.jd at gmail.com> wrote:

> Hi Mark,
>
> Thanks for your reply.
>
> On Mon, Jul 20, 2020 at 7:13 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Fande,
>> do you know if your 45226154 was out of range in the real  matrix?
>>
>
> I do not know since it was in building the AMG hierarchy.  The size of the
> original system is 1,428,284,880
>
>
>> What size integers do you use?
>>
>
> We are using 64-bit via "--with-64-bit-indices"
>
>
> I am trying to catch the cause of this issue by running more simulations
> with different configurations.
>
> Thanks,
>
> Fande,
>
>
> Thanks,
>> Mark
>>
>> On Mon, Jul 20, 2020 at 1:17 AM Fande Kong <fdkong.jd at gmail.com> wrote:
>>
>>> Trace could look like this:
>>>
>>> [640]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>>
>>> [640]PETSC ERROR: Argument out of range
>>>
>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521
>>>
>>> [640]PETSC ERROR: See
>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>>
>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown
>>>
>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by
>>> wangy2 Sun Jul 19 17:14:28 2020
>>>
>>> [640]PETSC ERROR: Configure options --download-hypre=1
>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1
>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1
>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1
>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11
>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices
>>> --download-mumps=0
>>>
>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h
>>>
>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c
>>>
>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>>
>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
>>>
>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901
>>> in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c
>>>
>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180
>>> in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c
>>>
>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c
>>>
>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c
>>>
>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
>>>
>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
>>>
>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c
>>>
>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c
>>>
>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c
>>>
>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
>>>
>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
>>>
>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
>>>
>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c
>>>
>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in
>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c
>>>
>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>
>>>> I am not entirely sure what is happening, but we encountered similar
>>>> issues recently.  It was not reproducible. It might occur at different
>>>> stages, and errors could be weird other than "ctable stuff." Our code was
>>>> Valgrind clean since every PR in moose needs to go through rigorous
>>>> Valgrind checks before it reaches the devel branch.  The errors happened
>>>> when we used mvapich.
>>>>
>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was
>>>> smooth.  May you try a different MPI? It is better to try a system carried
>>>> one.
>>>>
>>>> We did not get the bottom of this problem yet, but we at least know
>>>> this is kind of MPI-related.
>>>>
>>>> Thanks,
>>>>
>>>> Fande,
>>>>
>>>>
>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson <chris at resfrac.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am having a bug that is occurring in PETSC with the return string:
>>>>>
>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in
>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than
>>>>> largest key allowed 5693
>>>>>
>>>>> This is using petsc-3.13.2, compiled and running using mpich with -O3
>>>>> and debugging turned off tuned to the haswell architecture and
>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS
>>>>> factorization solve (I haven't been able to replicate this issue with the
>>>>> same set of instructions etc.).
>>>>>
>>>>> This is a terrible way to ask a question, I know, and not very helpful
>>>>> from your side, but this is what I have from a user's run and can't
>>>>> reproduce on my end (either with the optimization compilation or with
>>>>> debugging turned on). This happens when the code has run for quite some
>>>>> time and is happening somewhat rarely.
>>>>>
>>>>> More than likely I am using a static variable (code is written in c++)
>>>>> that I'm not updating when the matrix size is changing or something silly
>>>>> like that, but any help or guidance on this would be appreciated.
>>>>>
>>>>> *Chris Hewson*
>>>>> Senior Reservoir Simulation Engineer
>>>>> ResFrac
>>>>> +1.587.575.9792
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200720/cbe154ce/attachment.html>


More information about the petsc-users mailing list