[petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

Thu Aug 24 11:40:31 CDT 2023

   PETSc uses the non-MPI_F08 Fortran modules so I am guessing when you also use the MPI_F08 modules the compiler sees two sets of interfaces for the same functions hence the error.  I am not sure if it portable to use PETSc with the F08 Fortran modules  in the same program or routine.

> On Aug 24, 2023, at 12:22 PM, Vanella, Marcos (Fed) via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You might have an idea on what is going on here. 
> These are my modules:
> 
> Currently Loaded Modules:
>   1) lsf-tools/2.0   3) darshan-runtime/3.4.0-lite   5) DefApps       7) spectrum-mpi/10.4.0.3-20210112   9) nsight-systems/2021.3.1.54
>   2) hsi/5.0.2.p5    4) xalt/1.2.1                   6) nvhpc/22.11   8) nsight-compute/2021.2.1         10) cuda/11.7.1
> 
> I configured and compiled petsc with these options:
> 
> ./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda
> 
> without issues. The MPI checks did not go through as this was done in the login node.
> 
> Then, I started getting (similarly to what I saw with pgi and gcc in summit) ambiguous interface errors related to mpi routines. I was able to make a simple piece of code that reproduces this. It has to do with having a USE PETSC statement in a module (TEST_MOD) and a USE MPI_F08 on the main program (MAIN) using that module, even though the PRIVATE statement has been used in said (TEST_MOD) module.
> 
> MODULE TEST_MOD
> ! In this module we use PETSC.
> USE PETSC
> !USE MPI
> IMPLICIT NONE
> PRIVATE
> PUBLIC :: TEST1
> 
> CONTAINS
> SUBROUTINE TEST1(A)
> IMPLICIT NONE
> REAL, INTENT(INOUT) :: A
> INTEGER :: IERR
> A=0.
> ENDSUBROUTINE TEST1
> 
> ENDMODULE TEST_MOD
> 
> 
> PROGRAM MAIN
> 
> ! Assume in main we use some MPI_F08 features.
> USE MPI_F08
> USE TEST_MOD, ONLY : TEST1
> IMPLICIT NONE
> INTEGER :: MY_RANK,IERR=0
> INTEGER :: PNAMELEN=0
> INTEGER :: PROVIDED
> INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED
> REAL :: A=0.
> CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR)
> CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR)
> CALL TEST1(A)
> CALL MPI_FINALIZE(IERR)
> 
> ENDPROGRAM MAIN
> 
> Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to compile this code:
> 
> vanellam at login5 test_spectrum_issue $ mpifort -c -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include"  mpitest.f90
> NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread (mpitest.f90: 34)
> NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize (mpitest.f90: 37)
>   0 inform,   0 warnings,   2 severes, 0 fatal for main
> 
> Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation proceeds correctly. If I leave the USE PETSC statement in the module and change to USE MPI the statement in main compilation also goes through. So it seems to be something related to using the PETSC and MPI_F08 modules. My take is that it is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc with openmpi in other systems.
> 
> Well please let me know if you have any ideas on what might be going on. I'll move to polaris and try with mpich too.
> 
> Thanks!
> Marcos
> 
> 
> From: Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>>
> Sent: Tuesday, August 22, 2023 5:25 PM
> To: Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
> Cc: Vanella, Marcos (Fed) <marcos.vanella at nist.gov <mailto:marcos.vanella at nist.gov>>; PETSc users list <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; Guan, Collin X. (Fed) <collin.guan at nist.gov <mailto:collin.guan at nist.gov>>
> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
>  
> Macros,
>   yes, refer to the example script Matt mentioned for Summit.  Feel free to turn on/off options in the file.  In my experience, gcc is easier to use.
>   Also, I found https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, which might be similar to your machine (4 GPUs per node).  The key point is: The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank.
>   So you can try the helper script set_affinity_gpu_polaris.sh to manually set  CUDA_VISIBLE_DEVICES.  In other words, make the script on your PATH and then run your job with
>       srun -N 2 -n 16 set_affinity_gpu_polaris.sh /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda
> 
>   Then, check again with nvidia-smi to see if GPU memory is evenly allocated.
> --Junchao Zhang
> 
> 
> On Tue, Aug 22, 2023 at 3:03 PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
> On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
> Hi Junchao, both the slurm scontrol show job_id -dd and looking at CUDA_VISIBLE_DEVICES does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it.
> 
> I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). 
> 
> The PETSc configure examples are in the repository:
> 
>    https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads
> 
>     Thanks,
> 
>       Matt
>  
> I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? 
> 
> Thanks!
> 
> I configured the library --with-cuda and when compiling I get a compilation error with CUDAC:
> 
> CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1 <http://curand2.cu:1/>:
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:
> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]
>      THRUST_COMPILER_DEPRECATION(Clang 7.0);
>      ^
> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION'
>   THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>   ^
> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL'
> #  define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)
>                                      ^
> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'
> #  define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>                                        ^
> <scratch space>:141:6: note: expanded from here
>  GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>      ^
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2 <http://curand2.cu:2/>:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:
> In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:
> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:
> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]
>      CUB_COMPILER_DEPRECATION(Clang 7.0);
>      ^
> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION'
>   CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>   ^
> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL'
> #  define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)
>                                   ^
> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0'
> #  define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>                                     ^
> <scratch space>:198:6: note: expanded from here
>  GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>      ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here
> 
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1 <http://curand2.cu:1/>:
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:
> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]
>      THRUST_COMPILER_DEPRECATION(Clang 7.0);
>      ^
> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION'
>   THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>   ^
> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL'
> #  define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)
>                                      ^
> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'
> #  define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>                                        ^
> <scratch space>:149:6: note: expanded from here
>  GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>      ^
> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2 <http://curand2.cu:2/>:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:
> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:
> In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:
> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:
> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]
>      CUB_COMPILER_DEPRECATION(Clang 7.0);
>      ^
> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION'
>   CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>   ^
> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL'
> #  define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)
>                                   ^
> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0'
> #  define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>                                     ^
> <scratch space>:208:6: note: expanded from here
>  GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>      ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here
> 
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(a); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(a); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(len); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(t); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(s); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(flg); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(n); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(s); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(n); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(t); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(a); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(b); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(a); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(b); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(tmp); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(haystack);
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(needle);
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(tmp); 
>   ^
> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume'
> ; __builtin_assume(t); 
>   ^
> fatal error: too many errors emitted, stopping now [-ferror-limit=]
> 20 errors generated.
> Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp.
> gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1
> gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2
> **************************ERROR*************************************
>   Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log
>   Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov>
> ********************************************************************
> 
> 
>  
> From: Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>>
> Sent: Monday, August 21, 2023 4:17 PM
> To: Vanella, Marcos (Fed) <marcos.vanella at nist.gov <mailto:marcos.vanella at nist.gov>>
> Cc: PETSc users list <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; Guan, Collin X. (Fed) <collin.guan at nist.gov <mailto:collin.guan at nist.gov>>
> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
>  
> That is a good question.  Looking at https://slurm.schedmd.com/gres.html#GPU_Management,  I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated.
> 
> --Junchao Zhang
> 
> 
> On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed) <marcos.vanella at nist.gov <mailto:marcos.vanella at nist.gov>> wrote:
> Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? 
> It says in the script it has allocated 2.4GB
> Best,
> Marcos
> From: Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>>
> Sent: Monday, August 21, 2023 3:29 PM
> To: Vanella, Marcos (Fed) <marcos.vanella at nist.gov <mailto:marcos.vanella at nist.gov>>
> Cc: PETSc users list <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; Guan, Collin X. (Fed) <collin.guan at nist.gov <mailto:collin.guan at nist.gov>>
> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
>  
> Hi, Macros,
>   If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node.
>   The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation).   So your job script and output are all good.
> 
>   Thanks.
> 
> On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) <marcos.vanella at nist.gov <mailto:marcos.vanella at nist.gov>> wrote:
> Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s:
> 
> Mon Aug 21 14:36:07 2023       
> +---------------------------------------------------------------------------------------+
> | NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
> |-----------------------------------------+----------------------+----------------------+
> | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
> | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
> |                                         |                      |               MIG M. |
> |=========================================+======================+======================|
> |   0  Tesla V100-SXM2-16GB           On  | 00000004:04:00.0 Off |                    0 |
> | N/A   34C    P0              63W / 300W |   2488MiB / 16384MiB |      0%      Default |
> |                                         |                      |                  N/A |
> +-----------------------------------------+----------------------+----------------------+
> |   1  Tesla V100-SXM2-16GB           On  | 00000004:05:00.0 Off |                    0 |
> | N/A   38C    P0              56W / 300W |    638MiB / 16384MiB |      0%      Default |
> |                                         |                      |                  N/A |
> +-----------------------------------------+----------------------+----------------------+
> |   2  Tesla V100-SXM2-16GB           On  | 00000035:03:00.0 Off |                    0 |
> | N/A   35C    P0              52W / 300W |    638MiB / 16384MiB |      0%      Default |
> |                                         |                      |                  N/A |
> +-----------------------------------------+----------------------+----------------------+
> |   3  Tesla V100-SXM2-16GB           On  | 00000035:04:00.0 Off |                    0 |
> | N/A   38C    P0              53W / 300W |    638MiB / 16384MiB |      0%      Default |
> |                                         |                      |                  N/A |
> +-----------------------------------------+----------------------+----------------------+
>                                                                                          
> +---------------------------------------------------------------------------------------+
> | Processes:                                                                            |
> |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
> |        ID   ID                                                             Usage      |
> |=======================================================================================|
> |    0   N/A  N/A    214626      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |
> |    0   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |
> |    0   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |
> |    0   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |
> |    0   N/A  N/A    214630      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |
> |    0   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |
> |    0   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |
> |    0   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |
> |    1   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |
> |    1   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |
> |    2   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |
> |    2   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |
> |    3   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |
> |    3   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |
> +---------------------------------------------------------------------------------------+
> 
> 
> You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters.
> This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node):
> 
> #!/bin/bash
> # ../../Utilities/Scripts/qfds.sh -p 2  -T db -d test.fds
> #SBATCH -J test 
> #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err
> #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log
> #SBATCH --partition=gpu
> #SBATCH --ntasks=16
> #SBATCH --ntasks-per-node=8
> #SBATCH --cpus-per-task=1
> #SBATCH --nodes=2
> #SBATCH --time=01:00:00
> #SBATCH --gres=gpu:4
> 
> export OMP_NUM_THREADS=1
> # modules
> module load cuda/11.7
> module load gcc/11.2.1/toolset
> module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7
> 
> cd /home/mnv/Firemodels_fork/fds/Issues/PETSc
> 
> srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda
>                                    
> Thank you for the advice,
> Marcos
> 
>  
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230824/815364d0/attachment-0001.html>