[petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases

Junchao Zhang junchao.zhang at gmail.com
Tue Nov 15 14:42:17 CST 2022


Mark,
Do you have a reproducer using petsc examples?

On Tue, Nov 15, 2022, 12:49 PM Mark Adams <mfadams at lbl.gov> wrote:

> Junchao, this is the same problem that I have been having right?
>
> On Tue, Nov 15, 2022 at 11:56 AM Fackler, Philip via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> I built petsc with:
>>
>> $ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug
>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0
>> --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices
>> --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos
>> --download-kokkos-kernels
>>
>> $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all
>>
>> $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install
>>
>>
>> Then I build xolotl in a separate build directory (after checking out the
>> "feature-petsc-kokkos" branch) with:
>>
>> $ cmake -DCMAKE_BUILD_TYPE=Debug
>> -DKokkos_DIR=$HOME/build/petsc/debug/install
>> -DPETSC_DIR=$HOME/build/petsc/debug/install <xolotl-src>
>>
>> $ make -j4 SystemTester
>>
>>
>> Then, from the xolotl build directory, run (for example):
>>
>> $ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v
>>
>> Note that this test case will use the parameter file
>> '<xolotl-src>/benchmarks/params_system_NE_4.txt' which has the command-line
>> arguments for petsc in its "petscArgs=..." line. If you look at
>> '<xolotl-src>/test/system/SystemTester.cpp' all the system test cases
>> follow the same naming convention with their corresponding parameter files
>> under '<xolotl-src>/benchmarks'.
>>
>> The failure happens with the NE_4 case (which is 2D) and the PSI_3 case
>> (which is 1D).
>>
>> Let me know if this is still unclear.
>>
>> Thanks,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Tuesday, November 15, 2022 00:16
>> *To:* Fackler, Philip <facklerpw at ornl.gov>
>> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Blondel, Sophie
>> <sblondel at utk.edu>
>> *Subject:* [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with
>> COO interface crashes in some cases
>>
>> Hi, Philip,
>>   Can you tell me instructions to build Xolotl to reproduce the error?
>> --Junchao Zhang
>>
>>
>> On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>> In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use
>> the COO interface for preallocating and setting values in the Jacobian
>> matrix. I have found that with some of our test cases, using more than one
>> MPI rank results in a crash. Way down in the preconditioner code in petsc a
>> Mat gets computed that has "null" for the "productsymbolic" member of its
>> "ops". It's pretty far removed from where we compute the Jacobian entries,
>> so I haven't been able (so far) to track it back to an error in my code.
>> I'd appreciate some help with this from someone who is more familiar with
>> the petsc guts so we can figure out what I'm doing wrong. (I'm assuming
>> it's a bug in Xolotl.)
>>
>> Note that this is using the kokkos backend for Mat and Vec in petsc, but
>> with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only
>> multiple MPI rank run.
>>
>> Here's a paste of the error output showing the relevant parts of the call
>> stack:
>>
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] --------------------- Error Message
>> --------------------------------------------------------------
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] --------------------- Error Message
>> --------------------------------------------------------------
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] No support for this operation for this object type
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] No support for this operation for this object type
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] No method productsymbolic for Mat of type (null)
>> [ERROR] No method productsymbolic for Mat of type (null)
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
>> [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT
>> Date: 2022-10-28 14:39:41 +0000
>> [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT
>> Date: 2022-10-28 14:39:41 +0000
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
>> [ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc
>> PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc
>> --with-cxx=mpicxx --with-fc=0 --with-cudac=0
>> --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices
>> --with-shared-libraries
>> --with-kokkos-dir=/home/4pf/build/kokkos/serial/install
>> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install
>> [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc
>> PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc
>> --with-cxx=mpicxx --with-fc=0 --with-cudac=0
>> --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices
>> --with-shared-libraries
>> --with-kokkos-dir=/home/4pf/build/kokkos/serial/install
>> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at
>> /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918
>> [ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at
>> /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at
>> /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138
>> [ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at
>> /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #3 MatProductSymbolic() at
>> /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793
>> [ERROR] #3 MatProductSymbolic() at
>> /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #4 MatProduct_Private() at
>> /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820
>> [ERROR] #4 MatProduct_Private() at
>> /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] #5 MatMatMult() at
>> /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897
>> [ERROR] #5 MatMatMult() at
>> /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] #6 PCGAMGOptProlongator_AGG() at
>> /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769
>> [ERROR] #6 PCGAMGOptProlongator_AGG() at
>> /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] #7 PCSetUp_GAMG() at
>> /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639
>> [ERROR] #7 PCSetUp_GAMG() at
>> /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #8 PCSetUp() at
>> /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994
>> [ERROR] #8 PCSetUp() at
>> /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #9 KSPSetUp() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406
>> [ERROR] #9 KSPSetUp() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #10 KSPSolve_Private() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825
>> [ERROR] #10 KSPSolve_Private() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] #11 KSPSolve() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
>> [ERROR] #11 KSPSolve() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #12 PCApply_FieldSplit() at
>> /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246
>> [ERROR] #12 PCApply_FieldSplit() at
>> /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #13 PCApply() at
>> /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441
>> [ERROR] #13 PCApply() at
>> /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #14 KSP_PCApply() at
>> /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380
>> [ERROR] #14 KSP_PCApply() at
>> /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #15 KSPFGMRESCycle() at
>> /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152
>> [ERROR] #15 KSPFGMRESCycle() at
>> /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #16 KSPSolve_FGMRES() at
>> /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273
>> [ERROR] #16 KSPSolve_FGMRES() at
>> /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #17 KSPSolve_Private() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899
>> [ERROR] #17 KSPSolve_Private() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] #18 KSPSolve() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
>> [ERROR] #18 KSPSolve() at
>> /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] #19 SNESSolve_NEWTONLS() at
>> /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210
>> [ERROR] #19 SNESSolve_NEWTONLS() at
>> /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #20 SNESSolve() at
>> /home/4pf/repos/petsc/src/snes/interface/snes.c:4689
>> [ERROR] #20 SNESSolve() at
>> /home/4pf/repos/petsc/src/snes/interface/snes.c:4689
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #21 TSStep_ARKIMEX() at
>> /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791
>> [ERROR] #21 TSStep_ARKIMEX() at
>> /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445
>> [ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836
>> [ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836
>> [ERROR] PetscSolver::solve: TSSolve failed.
>> [ERROR] PetscSolver::solve: TSSolve failed.
>> Aborting.
>> Aborting.
>>
>>
>>
>> Thanks for the help,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221115/bbc89dd1/attachment-0001.html>


More information about the petsc-users mailing list