<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? </div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
It says in the script it has allocated 2.4GB</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Best,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Marcos<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Junchao Zhang <junchao.zhang@gmail.com><br>
<b>Sent:</b> Monday, August 21, 2023 3:29 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <marcos.vanella@nist.gov><br>
<b>Cc:</b> PETSc users list <petsc-users@mcs.anl.gov>; Guan, Collin X. (Fed) <collin.guan@nist.gov><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi, Macros,
<div>  If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node.</div>
<div>  The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation).   So your job script and output are all good.<br>
<div><br>
</div>
</div>
<div>  Thanks.</div>
</div>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div class="x_msg-2525567993800845248">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in
 the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">Mon Aug 21 14:36:07 2023       </span>
<div><span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |</span></div>
<div><span style="font-family:"Courier New",monospace">|-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |</span></div>
<div><span style="font-family:"Courier New",monospace">| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |               MIG M. |</span></div>
<div><span style="font-family:"Courier New",monospace">|=========================================+======================+======================|</span></div>
<div><span style="font-family:"Courier New",monospace">|   0  Tesla V100-SXM2-16GB           On  | 00000004:04:00.0 Off |                    0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A   34C    P0              63W / 300W |   2488MiB / 16384MiB |      0%      Default |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |                  N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">|   1  Tesla V100-SXM2-16GB           On  | 00000004:05:00.0 Off |                    0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A   38C    P0              56W / 300W |    638MiB / 16384MiB |      0%      Default |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |                  N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">|   2  Tesla V100-SXM2-16GB           On  | 00000035:03:00.0 Off |                    0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A   35C    P0              52W / 300W |    638MiB / 16384MiB |      0%      Default |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |                  N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">|   3  Tesla V100-SXM2-16GB           On  | 00000035:04:00.0 Off |                    0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A   38C    P0              53W / 300W |    638MiB / 16384MiB |      0%      Default |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |                  N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">                                                                                         </span></div>
<div><span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| Processes:                                                                            |</span></div>
<div><span style="font-family:"Courier New",monospace">|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |</span></div>
<div><span style="font-family:"Courier New",monospace">|        ID   ID                                                             Usage      |</span></div>
<div><span style="font-family:"Courier New",monospace">|=======================================================================================|</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214626      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214630      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    1   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    1   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    2   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    2   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    3   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    3   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node):</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<div><span style="font-family:"Courier New",monospace">#!/bin/bash</span></div>
<div><span style="font-family:"Courier New",monospace"># ../../Utilities/Scripts/qfds.sh -p 2  -T db -d test.fds</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -J test </span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --partition=gpu</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks=16</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks-per-node=8</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --cpus-per-task=1</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --nodes=2</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --time=01:00:00</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --gres=gpu:4</span></div>
<br>
<div><span style="font-family:"Courier New",monospace">export OMP_NUM_THREADS=1</span></div>
<div><span style="font-family:"Courier New",monospace"># modules</span></div>
<div><span style="font-family:"Courier New",monospace">module load cuda/11.7</span></div>
<div><span style="font-family:"Courier New",monospace">module load gcc/11.2.1/toolset</span></div>
<div><span style="font-family:"Courier New",monospace">module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">cd /home/mnv/Firemodels_fork/fds/Issues/PETSc</span></div>
<div><br>
</div>
<div></div>
<span style="font-family:"Courier New",monospace">srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda</span>
<div></div>
<span style="font-family:"Courier New",monospace">                                   </span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Thank you for the advice,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Marcos<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
 <br>
</div>
<div id="x_m_-2525567993800845248appendonsend"></div>
<br>
</div>
</div>
</blockquote>
</div>
</div>
</body>
</html>