[petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
Junchao Zhang
junchao.zhang at gmail.com
Tue Aug 22 16:25:19 CDT 2023
Macros,
yes, refer to the example script Matt mentioned for Summit. Feel free to
turn on/off options in the file. In my experience, gcc is easier to use.
Also, I found
https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus,
which might be similar to your machine (4 GPUs per node). The key point
is: The Cray MPI on Polaris does not currently support binding MPI ranks to
GPUs. For applications that need this support, this instead can be handled
by use of a small helper script that will appropriately set
CUDA_VISIBLE_DEVICES
for each MPI rank.
So you can try the helper script set_affinity_gpu_polaris.sh to manually
set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH
and then run your job with
srun -N 2 -n 16 set_affinity_gpu_polaris.sh
/home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux
test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda
Then, check again with nvidia-smi to see if GPU memory is evenly
allocated.
--Junchao Zhang
On Tue, Aug 22, 2023 at 3:03 PM Matthew Knepley <knepley at gmail.com> wrote:
> On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Hi Junchao, both the slurm scontrol show job_id -dd and looking at
>> CUDA_VISIBLE_DEVICES does not provide information about which MPI
>> process is associated to which GPU in the node in our system. I can see
>> this with nvidia-smi, but if you have any other suggestion using slurm I
>> would like to hear it.
>>
>> I've been trying to compile the code+Petsc in summit, but have been
>> having all sorts of issues related to spectrum-mpi, and the different
>> compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't
>> handle Fortran 2018, others give issues of repeated MPI definitions, etc.).
>>
>
> The PETSc configure examples are in the repository:
>
>
> https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads
>
> Thanks,
>
> Matt
>
>
>> I also wanted to ask you, do you know if it is possible to compile PETSc
>> with the xl/16.1.1-10 suite?
>>
>> Thanks!
>>
>> I configured the library --with-cuda and when compiling I get a
>> compilation error with CUDAC:
>>
>> CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/
>> curand2.cu:1:
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:
>> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6:
>> warning: Thrust requires at least Clang 7.0. Define
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
>> [-W#pragma-messages]
>> THRUST_COMPILER_DEPRECATION(Clang 7.0);
>> ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3:
>> note: expanded from macro 'THRUST_COMPILER_DEPRECATION'
>> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>> ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38:
>> note: expanded from macro 'THRUST_COMP_DEPR_IMPL'
>> # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning
>> #msg)
>> ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40:
>> note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'
>> # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>> ^
>> <scratch space>:141:6: note: expanded from here
>> GCC warning "Thrust requires at least Clang 7.0. Define
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>> ^
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/
>> curand2.cu:2:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:
>> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning:
>> CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT
>> to suppress this message. [-W#pragma-messages]
>> CUB_COMPILER_DEPRECATION(Clang 7.0);
>> ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note:
>> expanded from macro 'CUB_COMPILER_DEPRECATION'
>> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define
>> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>> ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note:
>> expanded from macro 'CUB_COMP_DEPR_IMPL'
>> # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)
>> ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note:
>> expanded from macro 'CUB_COMP_DEPR_IMPL0'
>> # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>> ^
>> <scratch space>:198:6: note: expanded from here
>> GCC warning "CUB requires at least Clang 7.0. Define
>> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68):
>> warning #1835-D: attribute "warn_unused_result" does not apply here
>>
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/
>> curand2.cu:1:
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:
>> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6:
>> warning: Thrust requires at least Clang 7.0. Define
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
>> [-W#pragma-messages]
>> THRUST_COMPILER_DEPRECATION(Clang 7.0);
>> ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3:
>> note: expanded from macro 'THRUST_COMPILER_DEPRECATION'
>> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>> ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38:
>> note: expanded from macro 'THRUST_COMP_DEPR_IMPL'
>> # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning
>> #msg)
>> ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40:
>> note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'
>> # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>> ^
>> <scratch space>:149:6: note: expanded from here
>> GCC warning "Thrust requires at least Clang 7.0. Define
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>> ^
>> In file included from
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/
>> curand2.cu:2:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:
>> In file included from
>> /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:
>> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning:
>> CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT
>> to suppress this message. [-W#pragma-messages]
>> CUB_COMPILER_DEPRECATION(Clang 7.0);
>> ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note:
>> expanded from macro 'CUB_COMPILER_DEPRECATION'
>> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define
>> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>> ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note:
>> expanded from macro 'CUB_COMP_DEPR_IMPL'
>> # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)
>> ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note:
>> expanded from macro 'CUB_COMP_DEPR_IMPL0'
>> # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>> ^
>> <scratch space>:208:6: note: expanded from here
>> GCC warning "CUB requires at least Clang 7.0. Define
>> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68):
>> warning #1835-D: attribute "warn_unused_result" does not apply here
>>
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(a);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(a);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(len);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(t);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(s);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(flg);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(n);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(s);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(n);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(t);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(a);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(b);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(a);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(b);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(tmp);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(haystack);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(needle);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(tmp);
>> ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3:
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(t);
>> ^
>> fatal error: too many errors emitted, stopping now [-ferror-limit=]
>> 20 errors generated.
>> Error while processing
>> /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp.
>> gmake[3]: *** [gmakefile:209:
>> arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1
>> gmake[2]: ***
>> [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28:
>> libs] Error 2
>> **************************ERROR*************************************
>> Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log
>> Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to
>> petsc-maint at mcs.anl.gov
>> ********************************************************************
>>
>>
>>
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Monday, August 21, 2023 4:17 PM
>> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
>> *Cc:* PETSc users list <petsc-users at mcs.anl.gov>; Guan, Collin X. (Fed) <
>> collin.guan at nist.gov>
>> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
>> processes and 1 GPU
>>
>> That is a good question. Looking at
>> https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if
>> you can share the output of your job so we can search CUDA_VISIBLE_DEVICES
>> and see how GPUs were allocated.
>>
>> --Junchao Zhang
>>
>>
>> On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed) <
>> marcos.vanella at nist.gov> wrote:
>>
>> Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI
>> processes meshes but only working on 2 of them?
>> It says in the script it has allocated 2.4GB
>> Best,
>> Marcos
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Monday, August 21, 2023 3:29 PM
>> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
>> *Cc:* PETSc users list <petsc-users at mcs.anl.gov>; Guan, Collin X. (Fed) <
>> collin.guan at nist.gov>
>> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
>> processes and 1 GPU
>>
>> Hi, Macros,
>> If you look at the PIDs of the nvidia-smi output, you will only find 8
>> unique PIDs, which is expected since you allocated 8 MPI ranks per node.
>> The duplicate PIDs are usually for threads spawned by the MPI runtime
>> (for example, progress threads in MPI implementation). So your job script
>> and output are all good.
>>
>> Thanks.
>>
>> On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) <
>> marcos.vanella at nist.gov> wrote:
>>
>> Hi Junchao, something I'm noting related to running with cuda enabled
>> linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu
>> calculations, the GPU 0 in the node is taking what seems to be all
>> sub-matrices corresponding to all the MPI processes in the node. This is
>> the result of the nvidia-smi command on a node with 8 MPI processes (each
>> advancing the same number of unknowns in the calculation) and 4 GPU V100s:
>>
>> Mon Aug 21 14:36:07 2023
>>
>> +---------------------------------------------------------------------------------------+
>> | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA
>> Version: 12.2 |
>>
>> |-----------------------------------------+----------------------+----------------------+
>> | GPU Name Persistence-M | Bus-Id Disp.A |
>> Volatile Uncorr. ECC |
>> | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage |
>> GPU-Util Compute M. |
>> | | |
>> MIG M. |
>>
>> |=========================================+======================+======================|
>> | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off |
>> 0 |
>> | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB |
>> 0% Default |
>> | | |
>> N/A |
>>
>> +-----------------------------------------+----------------------+----------------------+
>> | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off |
>> 0 |
>> | N/A 38C P0 56W / 300W | 638MiB / 16384MiB |
>> 0% Default |
>> | | |
>> N/A |
>>
>> +-----------------------------------------+----------------------+----------------------+
>> | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off |
>> 0 |
>> | N/A 35C P0 52W / 300W | 638MiB / 16384MiB |
>> 0% Default |
>> | | |
>> N/A |
>>
>> +-----------------------------------------+----------------------+----------------------+
>> | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off |
>> 0 |
>> | N/A 38C P0 53W / 300W | 638MiB / 16384MiB |
>> 0% Default |
>> | | |
>> N/A |
>>
>> +-----------------------------------------+----------------------+----------------------+
>>
>>
>>
>> +---------------------------------------------------------------------------------------+
>> | Processes:
>> |
>> | GPU GI CI PID Type Process name
>> GPU Memory |
>> | ID ID
>> Usage |
>>
>> |=======================================================================================|
>> | 0 N/A N/A 214626 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |
>> | 0 N/A N/A 214627 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |
>> | 0 N/A N/A 214628 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |
>> | 0 N/A N/A 214629 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |
>> | 0 N/A N/A 214630 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |
>> | 0 N/A N/A 214631 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |
>> | 0 N/A N/A 214632 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |
>> | 0 N/A N/A 214633 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |
>> | 1 N/A N/A 214627 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |
>> | 1 N/A N/A 214631 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |
>> | 2 N/A N/A 214628 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |
>> | 2 N/A N/A 214632 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |
>> | 3 N/A N/A 214629 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |
>> | 3 N/A N/A 214633 C
>> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |
>>
>> +---------------------------------------------------------------------------------------+
>>
>>
>> You can see that GPU 0 is connected to all 8 MPI Processes, each taking
>> about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes.
>> I'm wondering if this is expected or there are some changes I need to do on
>> my submission script/runtime parameters.
>> This is the script in this case (2 nodes, 8 MPI processes/node, 4
>> GPU/node):
>>
>> #!/bin/bash
>> # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds
>> #SBATCH -J test
>> #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err
>> #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log
>> #SBATCH --partition=gpu
>> #SBATCH --ntasks=16
>> #SBATCH --ntasks-per-node=8
>> #SBATCH --cpus-per-task=1
>> #SBATCH --nodes=2
>> #SBATCH --time=01:00:00
>> #SBATCH --gres=gpu:4
>>
>> export OMP_NUM_THREADS=1
>> # modules
>> module load cuda/11.7
>> module load gcc/11.2.1/toolset
>> module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7
>>
>> cd /home/mnv/Firemodels_fork/fds/Issues/PETSc
>>
>> srun -N 2 -n 16
>> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux
>> test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda
>>
>> Thank you for the advice,
>> Marcos
>>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230822/efb43962/attachment-0001.html>
More information about the petsc-users
mailing list