[petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
Junchao Zhang
junchao.zhang at gmail.com
Tue Aug 15 08:59:20 CDT 2023
On Tue, Aug 15, 2023 at 8:55 AM Vanella, Marcos (Fed) <
marcos.vanella at nist.gov> wrote:
> Hi Junchao, thank you for your observations and taking the time to look at
> this. So if I don't configure PETSc with the --with-cuda flag and still
> select HYPRE as the preconditioner, I still get Hypre to run on the GPU? I
> thought I needed that flag to get the solvers to run on the V100 card.
>
No, to have hypre run on CPU, you need to configure petsc/hypre without
--with-cuda; otherwise, you need --with-cuda and have to always use flags
like -vec_type cuda etc. I admit this is not user-friendly and should be
fixed by petsc and hypre developers.
>
>
> I'll remove the hardwired paths on the link flags, thanks for that!
>
> Marcos
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Monday, August 14, 2023 7:01 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Satish Balay <
> balay at mcs.anl.gov>; McDermott, Randall J. (Fed) <
> randall.mcdermott at nist.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Marcos,
> These are my findings. I successfully ran the test in the end.
>
> $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view
> Starting FDS ...
> ...
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to
> enable PETSc device support, for example, in some cases, -vec_type cuda
>
> Now I get why you met errors with "CPU runs". You configured and built
> hypre with petsc. Since you added --with-cuda, petsc would configure hypre
> with its GPU support. However, hypre has a limit/shortcoming that if it is
> configured with GPU support, you must pass GPU vectors to it. Thus the
> error. In other words, if you remove --with-cuda, you should be able to run
> above command.
>
>
> $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type
> aijcusparse -vec_type cuda
>
> Starting FDS ...
>
> MPI Process 0 started on hong-gce-workstation
> MPI Process 1 started on hong-gce-workstation
>
> Reading FDS input file ...
>
> At line 3014 of file ../../Source/read.f90
> Fortran runtime warning: An array temporary was created
> At line 3461 of file ../../Source/read.f90
> Fortran runtime warning: An array temporary was created
> WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any
> unassigned SPEC variables in the input were assigned the properties of
> nitrogen.
> At line 3014 of file ../../Source/read.f90
> ..
>
> Fire Dynamics Simulator
>
> ...
> STOP: FDS completed successfully (CHID: test)
>
> I guess there were link problems in your makefile. Actually, in the first
> try, I failed with
>
> mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter
> -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace
> -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none
> -fall-intrinsics -fbounds-check -cpp
> -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\" -DGITDATE_PP=\""Mon Aug 14
> 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\""
> -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC
> -I"/home/jczhang/petsc/include/"
> -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o
> fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o
> func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o
> part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o
> init.o dump.o read.o divg.o main.o -Wl,-rpath
> -Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags
> -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi
> -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib
> -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack
> -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig
> -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64
> -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas
> -lcusparse -lcusolver -lcurand -lcuda -lflapack -lfblas -lstdc++
> -L/usr/lib64 -lX11
> /usr/bin/ld: cannot find -lflapack: No such file or directory
> /usr/bin/ld: cannot find -lfblas: No such file or directory
> collect2: error: ld returned 1 exit status
> make: *** [../makefile:357: ompi_gnu_linux_db] Error 1
>
> That is because you hardwired many link flags in your fds/Build/makefile.
> Then I changed LFLAGS_PETSC to
> LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib
> -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc
>
> and everything worked. Could you also try it?
>
> --Junchao Zhang
>
>
> On Mon, Aug 14, 2023 at 4:53 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Attached is the test.fds test case. Thanks!
> ------------------------------
> *From:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Sent:* Monday, August 14, 2023 5:45 PM
> *To:* Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>; Satish Balay <balay at mcs.anl.gov>
> *Cc:* McDermott, Randall J. (Fed) <randall.mcdermott at nist.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> All right Junchao, thank you for looking at this!
>
> So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc
> env variables:
>
> # PETSc dir and arch, set MYSYS to nisaba dor FDS:
> export PETSC_DIR=/dir_to_petsc/petsc
> export PETSC_ARCH=arch-linux-c-dbg
> export MYSYSTEM=nisaba
>
> and configured the library with:
>
> $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2"
> FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes
> --with-shared-libraries=0 --download-suitesparse --download-hypre
> --download-fblaslapack --with-cuda
>
> Then made and checked the PETSc build.
>
> Then for FDS:
>
> 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX
> branch:
>
> $ cd ~/fds_dir
> $ git clone https://github.com/marcosvanella/fds.git
> $ cd fds
> $ git checkout FireX
>
>
> 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a
> debug target for fds (this is with cuda enabled openmpi compiled with gcc,
> in my case gcc-11.2 + PETSc):
>
> $ cd Build/ompi_gnu_linux_db
> $./make_fds.sh
>
> You should see compilation lines like this, with the WITH_PETSC
> Preprocessor variable being defined:
>
> Building ompi_gnu_linux_db
> mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter
> -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace
> -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none
> -fall-intrinsics -fbounds-check -cpp
> -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug
> 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\""
> -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC*
> -I"/home/mnv/Software/petsc/include/"
> -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90
> mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter
> -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace
> -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none
> -fall-intrinsics -fbounds-check -cpp
> -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug
> 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\""
> -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC*
> -I"/home/mnv/Software/petsc/include/"
> -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90
> mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter
> -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace
> -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none
> -fall-intrinsics -fbounds-check -cpp
> -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug
> 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\""
> -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC*
> -I"/home/mnv/Software/petsc/include/"
> -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90
> mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter
> -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace
> -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none
> -fall-intrinsics -fbounds-check -cpp
> -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug
> 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\""
> -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC*
> -I"/home/mnv/Software/petsc/include/"
> -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90
> ...
> ...
>
> If you are compiling on a Power9 node you might come across this error
> right off the bat:
>
> ../../Source/prec.f90:34:8:
>
> 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A
> very small number 16 byte accuracy
> | 1
> Error: Kind -3 not supported for type REAL at (1)
>
> which means for some reason gcc in the Power9 does not like quad precision
> definition in this manner. A way around it is to add the intrinsic
> Fortran2008 module iso_fortran_env:
>
> use, intrinsic :: iso_fortran_env
>
> in the fds/Source/prec.f90 file and change the quad precision denominator
> to:
>
> INTEGER, PARAMETER :: QB = REAL128
>
> in there. We are investigating the reason why this is happening. This is
> not related to Petsc in the code, everything related to PETSc calls is
> integers and double precision reals.
>
> After the code compiles you get the executable in
> ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db
>
> With which you can run the attached 2 mesh case as:
>
> $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db
> test.fds -log_view
>
> and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE
> which is what I was testing in CPU. This is the result I get from the
> previous submission in an interactive job in Enki (similar with batch
> submissions, gmres ksp, gamg pc):
>
>
> Starting FDS ...
>
> MPI Process 1 started on enki11.adlp
> MPI Process 0 started on enki11.adlp
>
> Reading FDS input file ...
>
> WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any
> unassigned SPEC variables in the input were assigned the properties of
> nitrogen.
> At line 3014 of file ../../Source/read.f90
> Fortran runtime warning: An array temporary was created
> At line 3014 of file ../../Source/read.f90
> Fortran runtime warning: An array temporary was created
> At line 3461 of file ../../Source/read.f90
> Fortran runtime warning: An array temporary was created
> At line 3461 of file ../../Source/read.f90
> Fortran runtime warning: An array temporary was created
> WARNING: DEVC Device is not within any mesh.
>
> Fire Dynamics Simulator
>
> Current Date : August 14, 2023 17:26:22
> Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX
> Revision Date : Mon Aug 14 17:07:20 2023 -0400
> Compiler : Gnu gfortran 11.2.1
> Compilation Date : Aug 14, 2023 17:11:05
>
> MPI Enabled; Number of MPI Processes: 2
> OpenMP Enabled; Number of OpenMP Threads: 1
>
> MPI version: 3.1
> MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp
> Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022
>
> Job TITLE :
> Job ID string : test
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> #0 0x2000397fcd8f in ???
> #1 0x2000397fb657 in ???
> #2 0x2000000604d7 in ???
> #3 0x200039cb9628 in ???
> #0 0x2000397fcd8f in ???
> #1 0x2000397fb657 in ???
> #2 0x2000000604d7 in ???
> #3 0x200039cb9628 in ???
> #4 0x200039c93eb3 in ???
> #5 0x200039364a97 in ???
> #4 0x200039c93eb3 in ???
> #5 0x200039364a97 in ???
> #6 0x20003935f6d3 in ???
> #7 0x20003935f78f in ???
> #8 0x20003935fc6b in ???
> #6 0x20003935f6d3 in ???
> #7 0x20003935f78f in ???
> #8 0x20003935fc6b in ???
> #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10 0x11ec67db in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11 0x11efc7e3 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10 0x11ec67db in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11 0x11efc7e3 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #12 0x11efc7e3 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> #12 0x11efc7e3 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> #13 0x11efc7e3 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> #14 0x11efc7e3 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15 0x11efc7e3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> #16 0x11efc7e3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213
> #17 0x11efc7e3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65
> #13 0x11efc7e3 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> #14 0x11efc7e3 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15 0x11efc7e3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> #16 0x11efc7e3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213
> #17 0x11efc7e3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65
> #18 0x11eda3c7 in
> _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em
> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88
> *#19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU*
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:2488
> *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE*
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:4300
> #18 0x11eda3c7 in
> _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em
> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88
> #*19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU*
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:2488
> *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE*
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:4300
> #21 0x11e91bc7 in MatSetPreallocationCOO
> at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650
> #21 0x11e91bc7 in MatSetPreallocationCOO
> at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650
> #22 0x1316d5ab in MatConvert_AIJ_HYPRE
> at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648
> #22 0x1316d5ab in MatConvert_AIJ_HYPRE
> at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648
> #23 0x11e3b463 in MatConvert
> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428
> #23 0x11e3b463 in MatConvert
> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428
> #24 0x14072213 in PCSetUp_HYPRE
> at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254
> #24 0x14072213 in PCSetUp_HYPRE
> at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254
> #25 0x1276a9db in PCSetUp
> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069
> #25 0x1276a9db in PCSetUp
> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069
> #26 0x127d923b in KSPSetUp
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415
> #27 0x127e033f in KSPSolve_Private
> #26 0x127d923b in KSPSetUp
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415
> #27 0x127e033f in KSPSolve_Private
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836
> #28 0x127e6f07 in KSPSolve
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082
> #28 0x127e6f07 in KSPSolve
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082
> #29 0x1280d70b in kspsolve_
> at
> /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335
> #29 0x1280d70b in kspsolve_
> at
> /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335
> #30 0x1140858f in __globmat_solver_MOD_glmat_solver
> at ../../Source/pres.f90:3130
> #30 0x1140858f in __globmat_solver_MOD_glmat_solver
> at ../../Source/pres.f90:3130
> #31 0x119faddf in pressure_iteration_scheme
> at ../../Source/main.f90:1449
> #32 0x1196c15f in fds
> at ../../Source/main.f90:688
> #31 0x119faddf in pressure_iteration_scheme
> at ../../Source/main.f90:1449
> #32 0x1196c15f in fds
> at ../../Source/main.f90:688
> #33 0x11a126f3 in main
> at ../../Source/main.f90:6
> #33 0x11a126f3 in main
> at ../../Source/main.f90:6
> --------------------------------------------------------------------------
> Primary job terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited
> on signal 6 (Aborted).
> --------------------------------------------------------------------------
>
> Seems the issue stems from the call to KSPSOLVE, line 3130 in
> fds/Source/pres.f90.
>
> Well, thank you for taking the time to look at this and also let me know
> if these threads should be moved to the issue tracker, or other venue.
> Best,
> Marcos
>
>
>
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Monday, August 14, 2023 4:37 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>; PETSc users list <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> I don't see a problem in the matrix assembly.
> If you point me to your repo and show me how to build it, I can try to
> reproduce.
>
> --Junchao Zhang
>
>
> On Mon, Aug 14, 2023 at 2:53 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type
> asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as
> (I understand) is done in the ex60. The error is always the same, so it
> seems it is not related to ksp,pc. Indeed it seems to happen when trying to
> offload the Matrix to the GPU:
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> #0 0x2000397fcd8f in ???
> ...
> #8 0x20003935fc6b in ???
> #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10 0x11ec769b in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11 0x11efd6a3 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10 0x11ec769b in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11 0x11efd6a3 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #12 0x11efd6a3 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #12 0x11efd6a3 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> #13 0x11efd6a3 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> #14 0x11efd6a3 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15 0x11efd6a3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> #13 0x11efd6a3 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> #14 0x11efd6a3 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15 0x11efd6a3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> #16 0x11efd6a3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213
> #17 0x11efd6a3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65
> #18 0x11edb287 in
> _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em
> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88
> #19 0x11edb287 in *MatSeqAIJCUSPARSECopyToGPU*
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:2488
> #20 0x11edfd1b in *MatSeqAIJCUSPARSEGetIJ*
> ...
> ...
>
> This is the piece of fortran code I have doing this within my Poisson
> solver:
>
> ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag
> blocks nonzeros per row to 5.
> CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL%
> NUNKH_TOTAL,ZSL%NUNKH_TOTAL,&
> 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS%
> A_H,PETSC_IERR)
> CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR)
> DO IROW=1,ZSL%NUNKH_LOCAL
> DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW)
> ! PETSC expects zero based indexes.1,Global I position (zero
> base),1,Global J position (zero base)
> CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1
> ,ZSL%JD_MAT_H(JCOL,IROW)-1,&
> ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR)
> ENDDO
> ENDDO
> CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR)
> CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR)
>
> Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size),
> and add nonzero values one by one. I wonder if there is something related
> to this that the copying to GPU does not like.
> Thanks,
> Marcos
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Monday, August 14, 2023 3:24 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* PETSc users list <petsc-users at mcs.anl.gov>; Satish Balay <
> balay at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Yeah, it looks like ex60 was run correctly.
> Double check your code again and if you still run into errors, we can try
> to reproduce on our end.
>
> Thanks.
> --Junchao Zhang
>
>
> On Mon, Aug 14, 2023 at 1:05 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The
> batch script for slurm submission, ex60.log and gpu stats files are
> attached.
> Nothing stands out as wrong to me but please have a look.
> I'll revisit running the original 2 MPI process + 1 GPU Poisson problem.
> Thanks!
> Marcos
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Friday, August 11, 2023 5:52 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* PETSc users list <petsc-users at mcs.anl.gov>; Satish Balay <
> balay at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Before digging into the details, could you try to run
> src/ksp/ksp/tests/ex60.c to make sure the environment is ok.
>
> The comment at the end shows how to run it
> test:
> requires: cuda
> suffix: 1_cuda
> nsize: 4
> args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type
> cusparse
>
> --Junchao Zhang
>
>
> On Fri, Aug 11, 2023 at 4:36 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Hi Junchao, thank you for the info. I compiled the main branch of PETSc in
> another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain
> and don't see the fortran compilation error. It might have been related to
> gcc-9.3.
> I tried the case again, 2 CPUs and one GPU and get this error now:
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> #0 0x2000397fcd8f in ???
> #1 0x2000397fb657 in ???
> #0 0x2000397fcd8f in ???
> #1 0x2000397fb657 in ???
> #2 0x2000000604d7 in ???
> #2 0x2000000604d7 in ???
> #3 0x200039cb9628 in ???
> #4 0x200039c93eb3 in ???
> #5 0x200039364a97 in ???
> #6 0x20003935f6d3 in ???
> #7 0x20003935f78f in ???
> #8 0x20003935fc6b in ???
> #3 0x200039cb9628 in ???
> #4 0x200039c93eb3 in ???
> #5 0x200039364a97 in ???
> #6 0x20003935f6d3 in ???
> #7 0x20003935f78f in ???
> #8 0x20003935fc6b in ???
> #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10 0x11ec425b in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10 0x11ec425b in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11 0x11efa263 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11 0x11efa263 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #12 0x11efa263 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> #13 0x11efa263 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #12 0x11efa263 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> #13 0x11efa263 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> #14 0x11efa263 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15 0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> #14 0x11efa263 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15 0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> #16 0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213
> #17 0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65
> #18 0x11ed7e47 in
> _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em
> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88
> #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:2488
> #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:4696
> #16 0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213
> #17 0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65
> #18 0x11ed7e47 in
> _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em
> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88
> #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:2488
> #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:4696
> #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/
> mpiaijcusparse.cu:251
> #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/
> mpiaijcusparse.cu:251
> #22 0x133f141f in MatMPIAIJGetLocalMatMerge
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342
> #22 0x133f141f in MatMPIAIJGetLocalMatMerge
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342
> #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368
> #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368
> #24 0x1377e1df in MatProductSymbolic
> at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795
> #24 0x1377e1df in MatProductSymbolic
> at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795
> #25 0x11e4dd1f in MatPtAP
> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934
> #25 0x11e4dd1f in MatPtAP
> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934
> #26 0x130d792f in MatCoarsenApply_MISK_private
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283
> #26 0x130d792f in MatCoarsenApply_MISK_private
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283
> #27 0x130db89b in MatCoarsenApply_MISK
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368
> #27 0x130db89b in MatCoarsenApply_MISK
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368
> #28 0x130bf5a3 in MatCoarsenApply
> at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97
> #28 0x130bf5a3 in MatCoarsenApply
> at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97
> #29 0x141518ff in PCGAMGCoarsen_AGG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524
> #29 0x141518ff in PCGAMGCoarsen_AGG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524
> #30 0x13b3a43f in PCSetUp_GAMG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631
> #30 0x13b3a43f in PCSetUp_GAMG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631
> #31 0x1276845b in PCSetUp
> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069
> #31 0x1276845b in PCSetUp
> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069
> #32 0x127d6cbb in KSPSetUp
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415
> #32 0x127d6cbb in KSPSetUp
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415
> #33 0x127dddbf in KSPSolve_Private
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836
> #33 0x127dddbf in KSPSolve_Private
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836
> #34 0x127e4987 in KSPSolve
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082
> #34 0x127e4987 in KSPSolve
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082
> #35 0x1280b18b in kspsolve_
> at
> /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335
> #35 0x1280b18b in kspsolve_
> at
> /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335
> #36 0x1140945f in __globmat_solver_MOD_glmat_solver
> at ../../Source/pres.f90:3128
> #36 0x1140945f in __globmat_solver_MOD_glmat_solver
> at ../../Source/pres.f90:3128
> #37 0x119f8853 in pressure_iteration_scheme
> at ../../Source/main.f90:1449
> #37 0x119f8853 in pressure_iteration_scheme
> at ../../Source/main.f90:1449
> #38 0x11969bd3 in fds
> at ../../Source/main.f90:688
> #38 0x11969bd3 in fds
> at ../../Source/main.f90:688
> #39 0x11a10167 in main
> at ../../Source/main.f90:6
> #39 0x11a10167 in main
> at ../../Source/main.f90:6
> srun: error: enki12: tasks 0-1: Aborted (core dumped)
>
>
> This was the slurm submission script in this case:
>
> #!/bin/bash
> # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds
> #SBATCH -J test
> #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err
> #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log
> #SBATCH --partition=debug
> #SBATCH --ntasks=2
> #SBATCH --nodes=1
> #SBATCH --cpus-per-task=1
> #SBATCH --ntasks-per-node=2
> #SBATCH --time=01:00:00
> #SBATCH --gres=gpu:1
>
> export OMP_NUM_THREADS=1
>
> # PETSc dir and arch:
> export PETSC_DIR=/home/mnv/Software/petsc
> export PETSC_ARCH=arch-linux-c-dbg
>
> # SYSTEM name:
> export MYSYSTEM=enki
>
> # modules
> module load cuda/11.7
> module load gcc/11.2.1/toolset
> module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7
>
> cd /home/mnv/Firemodels_fork/fds/Issues/PETSc
> srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2
> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db
> test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg
>
> The configure.log for the PETSc build is attached. Another clue to what
> is happening is that even setting the matrices/vectors to be mpi (-vec_type
> mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning :
>
> 0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR: GPU error
> [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100
> (cudaErrorNoDevice) : no CUDA-capable device is detected
> [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
> program crashed before usage or a spelling mistake, etc!
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100
> (cudaErrorNoDevice) : no CUDA-capable device is detected
> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
> program crashed before usage or a spelling mistake, etc!
> [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command
> line
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command
> line
> [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad
> GIT Date: 2023-08-11 15:13:02 +0000
> [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad
> GIT Date: 2023-08-11 15:13:02 +0000
> [0]PETSC ERROR:
> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db
> on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023
> [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2"
> FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2"
> --with-debugging=yes --with-shared-libraries=0 --download-suitesparse
> --download-hypre --download-fblaslapack --with-cuda
> ...
>
> I would have expected not to see GPU errors being printed out, given I did
> not request cuda matrix/vectors. The case run anyways, I assume it
> defaulted to the CPU solver.
> Let me know if you have any ideas as to what is happening. Thanks,
> Marcos
>
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Friday, August 11, 2023 3:35 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>; PETSc users list <
> petsc-users at mcs.anl.gov>; Satish Balay <balay at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Marcos,
> We do not have good petsc/gpu documentation, but see
> https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires:
> cuda" in petsc tests and you will find examples using GPU.
> For the Fortran compile errors, attach your configure.log and Satish
> (Cc'ed) or others should know how to fix them.
>
> Thanks.
> --Junchao Zhang
>
>
> On Fri, Aug 11, 2023 at 2:22 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Hi Junchao, thanks for the explanation. Is there some development
> documentation on the GPU work? I'm interested learning about it.
> I checked out the main branch and configured petsc. when compiling with
> gcc/gfortran I come across this error:
>
> ....
> CUDAC
> arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o
> CUDAC.dep
> arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o
> FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o
> FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61:
>
> 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z)
> | 1
> *Error: Symbol ‘pcasmcreatesubdomains2d’ at (1) already has an explicit
> interface*
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13:
>
> 38 | import tIS
> | 1
> Error: IMPORT statement at (1) only permitted in an INTERFACE body
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80:
>
> 39 | PetscInt a ! PetscInt
> |
> 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80:
>
> 40 | PetscInt b ! PetscInt
> |
> 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80:
>
> 41 | PetscInt c ! PetscInt
> |
> 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80:
>
> 42 | PetscInt d ! PetscInt
> |
> 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80:
>
> 43 | PetscInt e ! PetscInt
> |
> 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80:
>
> 44 | PetscInt f ! PetscInt
> |
> 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80:
>
> 45 | PetscInt g ! PetscInt
> |
> 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30:
>
> 46 | IS h ! IS
> | 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30:
>
> 47 | IS i ! IS
> | 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43:
>
> 48 | PetscErrorCode z
> | 1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10:
>
> 49 | end subroutine PCASMCreateSubdomains2D
> | 1
> Error: Expecting END INTERFACE statement at (1)
> make[3]: *** [gmakefile:225:
> arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1
> make[3]: *** Waiting for unfinished jobs....
> CC
> arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o
> CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o
> CUDAC
> arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o
> CUDAC.dep
> arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o
> make[3]: Leaving directory '/home/mnv/Software/petsc'
> make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs]
> Error 2
> make[2]: Leaving directory '/home/mnv/Software/petsc'
> **************************ERROR*************************************
> Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log
> Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to
> petsc-maint at mcs.anl.gov
> ********************************************************************
> make[1]: *** [makefile:45: all] Error 1
> make: *** [GNUmakefile:9: all] Error 2
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Friday, August 11, 2023 3:04 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Hi, Macros,
> I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack.
> We recently refactored the COO code and got rid of that function. So could
> you try petsc/main?
> We map MPI processes to GPUs in a round-robin fashion. We query the
> number of visible CUDA devices (g), and assign the device (rank%g) to the
> MPI process (rank). In that sense, the work distribution is totally
> determined by your MPI work partition (i.e, yourself).
> On clusters, this MPI process to GPU binding is usually done by the job
> scheduler like slurm. You need to check your cluster's users' guide to see
> how to bind MPI processes to GPUs. If the job scheduler has done that, the
> number of visible CUDA devices to a process might just appear to be 1,
> making petsc's own mapping void.
>
> Thanks.
> --Junchao Zhang
>
>
> On Fri, Aug 11, 2023 at 12:43 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Hi Junchao, thank you for replying. I compiled petsc in debug mode and
> this is what I get for the case:
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an
> illegal memory access was encountered
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> #0 0x15264731ead0 in ???
> #1 0x15264731dc35 in ???
> #2 0x15264711551f in ???
> #3 0x152647169a7c in ???
> #4 0x152647115475 in ???
> #5 0x1526470fb7f2 in ???
> #6 0x152647678bbd in ???
> #7 0x15264768424b in ???
> #8 0x1526476842b6 in ???
> #9 0x152647684517 in ???
> #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224
> #11 0x55bb46342ebb in
> _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_
> at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316
> #12 0x55bb46342ebb in
> _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_
> at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544
> #13 0x55bb46342ebb in
> _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_
> at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669
> #14 0x55bb46317bc5 in
> _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_
> at /usr/local/cuda/include/thrust/detail/sort.inl:115
> #15 0x55bb46317bc5 in
> _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_
> at /usr/local/cuda/include/thrust/detail/sort.inl:305
> #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:4452
> #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/
> mpiaijcusparse.cu:173
> #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/
> mpiaijcusparse.cu:222
> #19 0x55bb468e01cf in MatSetPreallocationCOO
> at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606
> #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547
> #21 0x55bb469015e5 in MatProductSymbolic
> at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803
> #22 0x55bb4694ade2 in MatPtAP
> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897
> #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283
> #24 0x55bb4696eb67 in MatCoarsenApply_MISK
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368
> #25 0x55bb4695bd91 in MatCoarsenApply
> at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97
> #26 0x55bb478294d8 in PCGAMGCoarsen_AGG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524
> #27 0x55bb471d1cb4 in PCSetUp_GAMG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631
> #28 0x55bb464022cf in PCSetUp
> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994
> #29 0x55bb4718b8a7 in KSPSetUp
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406
> #30 0x55bb4718f22e in KSPSolve_Private
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824
> #31 0x55bb47192c0c in KSPSolve
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070
> #32 0x55bb463efd35 in kspsolve_
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320
> #33 0x55bb45e94b32 in ???
> #34 0x55bb46048044 in ???
> #35 0x55bb46052ea1 in ???
> #36 0x55bb45ac5f8e in ???
> #37 0x1526470fcd8f in ???
> #38 0x1526470fce3f in ???
> #39 0x55bb45aef55d in ???
> #40 0xffffffffffffffff in ???
> --------------------------------------------------------------------------
> Primary job terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited
> on signal 6 (Aborted).
> --------------------------------------------------------------------------
>
> BTW, I'm curious. If I set n MPI processes, each of them building a part
> of the linear system, and g GPUs, how does PETSc distribute those n pieces
> of system matrix and rhs in the g GPUs? Does it do some load balancing
> algorithm? Where can I read about this?
> Thank you and best Regards, I can also point you to my code repo in GitHub
> if you want to take a closer look.
>
> Best Regards,
> Marcos
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Friday, August 11, 2023 10:52 AM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Hi, Marcos,
> Could you build petsc in debug mode and then copy and paste the whole
> error stack message?
>
> Thanks
> --Junchao Zhang
>
>
> On Thu, Aug 10, 2023 at 5:51 PM Vanella, Marcos (Fed) via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hi, I'm trying to run a parallel matrix vector build and linear solution
> with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix
> build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda
> enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the
> following error:
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress:
> an illegal memory access was encountered*
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an
> illegal memory access was encountered
>
> Program received signal SIGABRT: Process abort signal.
>
> I'm new to submitting jobs in slurm that also use GPU resources, so I
> might be doing something wrong in my submission script. This is it:
>
> #!/bin/bash
> #SBATCH -J test
> #SBATCH -e /home/Issues/PETSc/test.err
> #SBATCH -o /home/Issues/PETSc/test.log
> #SBATCH --partition=batch
> #SBATCH --ntasks=2
> #SBATCH --nodes=1
> #SBATCH --cpus-per-task=1
> #SBATCH --ntasks-per-node=2
> #SBATCH --time=01:00:00
> #SBATCH --gres=gpu:1
>
> export OMP_NUM_THREADS=1
> module load cuda/11.5
> module load openmpi/4.1.1
>
> cd /home/Issues/PETSc
> *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type
> mpicuda -mat_type mpiaijcusparse -pc_type gamg*
>
> If anyone has any suggestions on how o troubleshoot this please let me
> know.
> Thanks!
> Marcos
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230815/a28c07ce/attachment-0001.html>
More information about the petsc-users
mailing list