[petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases

Fackler, Philip facklerpw at ornl.gov
Tue Nov 22 11:56:39 CST 2022


Great! Thank you!

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Tuesday, November 22, 2022 12:02
To: Fackler, Philip <facklerpw at ornl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>
Subject: Re: [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO interface crashes in some cases




On Tue, Nov 22, 2022 at 10:14 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
Yes, that one is. I haven't updated the tests. So just build the SystemTester target or the xolotl target.
OK, I see. I reproduced the petsc error and am looking into it.  Thanks a lot.


Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Monday, November 21, 2022 15:36
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO interface crashes in some cases




On Mon, Nov 21, 2022 at 9:31 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
Not sure why. I'm using the same compiler. But you can try constructing the object explicitly on that line:

idPairs.push_back(core::RowColPair{i, i});

WIth your change, I continued but met another error:
   /home/jczhang/xolotl/test/core/diffusion/Diffusion2DHandlerTester.cpp(79): error: class "xolotl::core::diffusion::Diffusion2DHandler" has no member "initializeOFill"

it seems all these problems are related to the branch feature-petsc-kokkos, instead of the compiler etc. When I switched to origin/stable,  I could build xolotl.


Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Sunday, November 20, 2022 13:25
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO interface crashes in some cases



On Tue, Nov 15, 2022 at 10:55 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
I built petsc with:

$ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0 --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos --download-kokkos-kernels

$ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all

$ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install


Then I build xolotl in a separate build directory (after checking out the "feature-petsc-kokkos" branch) with:

$ cmake -DCMAKE_BUILD_TYPE=Debug -DKokkos_DIR=$HOME/build/petsc/debug/install -DPETSC_DIR=$HOME/build/petsc/debug/install <xolotl-src>

$ make -j4 SystemTester
Hi, Philip,  I tried multiple times and still failed at building xolotl.  I installed boost-1.74 and HDF5, and used gcc-11.3.

make -j4 SystemTester
...
[  9%] Building CXX object xolotl/core/CMakeFiles/xolotlCore.dir/src/diffusion/DiffusionHandler.cpp.o
/home/jczhang/xolotl/xolotl/core/src/diffusion/DiffusionHandler.cpp(55): error: no instance of overloaded function "std::vector<_Tp, _Alloc>::push_back [with _Tp=xolotl::core::RowColPair, _Alloc=std::allocator<xolotl::core::RowColPair>]" matches the argument list
            argument types are: ({...})
            object type is: std::vector<xolotl::core::RowColPair, std::allocator<xolotl::core::RowColPair>>

1 error detected in the compilation of "/home/jczhang/xolotl/xolotl/core/src/diffusion/DiffusionHandler.cpp".



Then, from the xolotl build directory, run (for example):

$ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v

Note that this test case will use the parameter file '<xolotl-src>/benchmarks/params_system_NE_4.txt' which has the command-line arguments for petsc in its "petscArgs=..." line. If you look at '<xolotl-src>/test/system/SystemTester.cpp' all the system test cases follow the same naming convention with their corresponding parameter files under '<xolotl-src>/benchmarks'.

The failure happens with the NE_4 case (which is 2D) and the PSI_3 case (which is 1D).

Let me know if this is still unclear.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Tuesday, November 15, 2022 00:16
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>
Subject: [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO interface crashes in some cases

Hi, Philip,
  Can you tell me instructions to build Xolotl to reproduce the error?
--Junchao Zhang


On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use the COO interface for preallocating and setting values in the Jacobian matrix. I have found that with some of our test cases, using more than one MPI rank results in a crash. Way down in the preconditioner code in petsc a Mat gets computed that has "null" for the "productsymbolic" member of its "ops". It's pretty far removed from where we compute the Jacobian entries, so I haven't been able (so far) to track it back to an error in my code. I'd appreciate some help with this from someone who is more familiar with the petsc guts so we can figure out what I'm doing wrong. (I'm assuming it's a bug in Xolotl.)

Note that this is using the kokkos backend for Mat and Vec in petsc, but with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only multiple MPI rank run.

Here's a paste of the error output showing the relevant parts of the call stack:

[ERROR] [0]PETSC ERROR:
[ERROR] --------------------- Error Message --------------------------------------------------------------
[ERROR] [1]PETSC ERROR:
[ERROR] --------------------- Error Message --------------------------------------------------------------
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] No support for this operation for this object type
[ERROR] [1]PETSC ERROR:
[ERROR] No support for this operation for this object type
[ERROR] [0]PETSC ERROR:
[ERROR] No method productsymbolic for Mat of type (null)
[ERROR] No method productsymbolic for Mat of type (null)
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
[ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 2022-10-28 14:39:41 +0000
[ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 2022-10-28 14:39:41 +0000
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
[ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install
[ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918
[ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138
[ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793
[ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820
[ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897
[ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769
[ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639
[ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994
[ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406
[ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825
[ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
[ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246
[ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441
[ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380
[ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152
[ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273
[ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899
[ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
[ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210
[ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689
[ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791
[ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445
[ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836
[ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836
[ERROR] PetscSolver::solve: TSSolve failed.
[ERROR] PetscSolver::solve: TSSolve failed.
Aborting.
Aborting.



Thanks for the help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221122/3c8b2007/attachment-0001.html>


More information about the petsc-users mailing list