[petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases

Fackler, Philip facklerpw at ornl.gov
Tue Nov 15 10:55:26 CST 2022


I built petsc with:

$ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0 --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos --download-kokkos-kernels

$ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all

$ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install


Then I build xolotl in a separate build directory (after checking out the "feature-petsc-kokkos" branch) with:

$ cmake -DCMAKE_BUILD_TYPE=Debug -DKokkos_DIR=$HOME/build/petsc/debug/install -DPETSC_DIR=$HOME/build/petsc/debug/install <xolotl-src>

$ make -j4 SystemTester


Then, from the xolotl build directory, run (for example):

$ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v

Note that this test case will use the parameter file '<xolotl-src>/benchmarks/params_system_NE_4.txt' which has the command-line arguments for petsc in its "petscArgs=..." line. If you look at '<xolotl-src>/test/system/SystemTester.cpp' all the system test cases follow the same naming convention with their corresponding parameter files under '<xolotl-src>/benchmarks'.

The failure happens with the NE_4 case (which is 2D) and the PSI_3 case (which is 1D).

Let me know if this is still unclear.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Tuesday, November 15, 2022 00:16
To: Fackler, Philip <facklerpw at ornl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>
Subject: [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO interface crashes in some cases

Hi, Philip,
  Can you tell me instructions to build Xolotl to reproduce the error?
--Junchao Zhang


On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use the COO interface for preallocating and setting values in the Jacobian matrix. I have found that with some of our test cases, using more than one MPI rank results in a crash. Way down in the preconditioner code in petsc a Mat gets computed that has "null" for the "productsymbolic" member of its "ops". It's pretty far removed from where we compute the Jacobian entries, so I haven't been able (so far) to track it back to an error in my code. I'd appreciate some help with this from someone who is more familiar with the petsc guts so we can figure out what I'm doing wrong. (I'm assuming it's a bug in Xolotl.)

Note that this is using the kokkos backend for Mat and Vec in petsc, but with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only multiple MPI rank run.

Here's a paste of the error output showing the relevant parts of the call stack:

[ERROR] [0]PETSC ERROR:
[ERROR] --------------------- Error Message --------------------------------------------------------------
[ERROR] [1]PETSC ERROR:
[ERROR] --------------------- Error Message --------------------------------------------------------------
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] No support for this operation for this object type
[ERROR] [1]PETSC ERROR:
[ERROR] No support for this operation for this object type
[ERROR] [0]PETSC ERROR:
[ERROR] No method productsymbolic for Mat of type (null)
[ERROR] No method productsymbolic for Mat of type (null)
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
[ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 2022-10-28 14:39:41 +0000
[ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 2022-10-28 14:39:41 +0000
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
[ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install
[ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918
[ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138
[ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793
[ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820
[ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897
[ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769
[ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639
[ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994
[ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406
[ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825
[ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
[ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246
[ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441
[ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380
[ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152
[ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273
[ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899
[ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
[ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210
[ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689
[ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791
[ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445
[ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836
[ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836
[ERROR] PetscSolver::solve: TSSolve failed.
[ERROR] PetscSolver::solve: TSSolve failed.
Aborting.
Aborting.



Thanks for the help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221115/2cb5fe69/attachment-0001.html>


More information about the petsc-users mailing list