[petsc-users] Tough to reproduce petsctablefind error

Junchao Zhang junchao.zhang at gmail.com
Thu Jul 23 18:57:53 CDT 2020


On Mon, Jul 20, 2020 at 7:05 AM Barry Smith <bsmith at petsc.dev> wrote:

>
>     Is there a comprehensive  MPI test suite (perhaps from MPICH)? Is
> there any way to run this full test suite under the problematic MPI and see
> if it detects any problems?
>
>      Is so, could someone add it to the FAQ in the debugging section?
>
MPICH does have a test suite. It is at the subdir test/mpi of downloaded
mpich <http://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2.tar.gz>.  It
annoyed me since it is not user-friendly.  It might be helpful in catching
bugs at very small scale. But say if I want to test allreduce on 1024 ranks
on 100 doubles, I have to hack the test suite.
Anyway, the instructions are here.

For the purpose of petsc, under test/mpi one can configure it with
$./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi
--enable-threads=funneled --enable-fortran=f77,f90 --enable-fast
--disable-spawn --disable-cxx --disable-ft-tests  // It is weird I disabled
cxx but I had to set CXX!
$make -k -j8  // -k is to keep going and ignore compilation errors, e.g.,
when building tests for MPICH extensions not in MPI standard, but your MPI
is OpenMPI.
$ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are sub-dirs
containing tests for MPI routines Petsc does not rely on.
$ make testings or directly './runtests -tests=testlist'

On a batch system,
$export MPITEST_BATCHDIR=`pwd`/btest       // specify a batch dir, say
btest,
$./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist   // Use 1024
ranks if a test does no specify the number of processes.
$ // It copies test binaries to the batch dir and generates a
script runtests.batch there.  Edit the script to fit your batch system and
then submit a job and wait for its finish.
$ cd btest && ../checktests --ignorebogus


PS: Fande, changing an MPI fixed your problem does not necessarily mean the
old MPI has bugs. It is complicated. It could be a petsc bug.  You need to
provide us a code to reproduce your error. It does not matter if the code
is big.


>     Thanks
>
>       Barry
>
>
> On Jul 20, 2020, at 12:16 AM, Fande Kong <fdkong.jd at gmail.com> wrote:
>
> Trace could look like this:
>
> [640]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [640]PETSC ERROR: Argument out of range
> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521
> [640]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown
> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by
> wangy2 Sun Jul 19 17:14:28 2020
> [640]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no
> --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1
> --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1
> --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1
> --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0
> --with-64-bit-indices --download-mumps=0
> [640]PETSC ERROR: #1 PetscTableFind() line 132 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h
> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c
> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c
> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c
> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180
> in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c
> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c
> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c
> [640]PETSC ERROR: #9 MatPtAP() line 9199 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c
> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c
> [640]PETSC ERROR: #13 PCSetUp() line 898 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c
> [640]PETSC ERROR: #14 KSPSetUp() line 376 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
> [640]PETSC ERROR: #16 KSPSolve() line 853 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c
> [640]PETSC ERROR: #18 SNESSolve() line 4519 in
> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c
>
> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>> I am not entirely sure what is happening, but we encountered similar
>> issues recently.  It was not reproducible. It might occur at different
>> stages, and errors could be weird other than "ctable stuff." Our code was
>> Valgrind clean since every PR in moose needs to go through rigorous
>> Valgrind checks before it reaches the devel branch.  The errors happened
>> when we used mvapich.
>>
>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was
>> smooth.  May you try a different MPI? It is better to try a system carried
>> one.
>>
>> We did not get the bottom of this problem yet, but we at least know this
>> is kind of MPI-related.
>>
>> Thanks,
>>
>> Fande,
>>
>>
>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson <chris at resfrac.com> wrote:
>>
>>> Hi,
>>>
>>> I am having a bug that is occurring in PETSC with the return string:
>>>
>>> [7]PETSC ERROR: PetscTableFind() line 132 in
>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than
>>> largest key allowed 5693
>>>
>>> This is using petsc-3.13.2, compiled and running using mpich with -O3
>>> and debugging turned off tuned to the haswell architecture and
>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS
>>> factorization solve (I haven't been able to replicate this issue with the
>>> same set of instructions etc.).
>>>
>>> This is a terrible way to ask a question, I know, and not very helpful
>>> from your side, but this is what I have from a user's run and can't
>>> reproduce on my end (either with the optimization compilation or with
>>> debugging turned on). This happens when the code has run for quite some
>>> time and is happening somewhat rarely.
>>>
>>> More than likely I am using a static variable (code is written in c++)
>>> that I'm not updating when the matrix size is changing or something silly
>>> like that, but any help or guidance on this would be appreciated.
>>>
>>> *Chris Hewson*
>>> Senior Reservoir Simulation Engineer
>>> ResFrac
>>> +1.587.575.9792
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200723/e55a1a3c/attachment.html>


More information about the petsc-users mailing list