<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Thank you Barry, I will dial back the MPI_F08 use in our source code and try compiling it. I haven't found much information regarding using MPI and MPI_F08 in different modules other than the following link from several years ago:</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
<a href="https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5" id="LPlnk450070">https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5</a></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
Looks like this has been fixed for openmpi and newer gfortran versions because I don't have issues with this MPI lib/compiler combination. Same with openmpi/ifort.<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
What I find quite interesting is: I assumed the PRIVATE statement in a module should provide a backstop on the access propagation of variables not explicitly stated in the PUBLIC statement in a module, including the ones that belong to other modules upstream
visible through USE. This does not seem to be the case here.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
Best,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
Marcos<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Barry Smith <bsmith@petsc.dev><br>
<b>Sent:</b> Thursday, August 24, 2023 12:40 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <marcos.vanella@nist.gov><br>
<b>Cc:</b> PETSc users list <petsc-users@mcs.anl.gov>; Guan, Collin X. (Fed) <collin.guan@nist.gov><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div style="line-break:after-white-space">
<div><br>
</div>
PETSc uses the non-<span style="font-size:16px; font-family:Calibri,Arial,Helvetica,sans-serif">MPI_F08 Fortran modules so I am guessing when you also use the </span><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px">MPI_F08 modules
the compiler sees two sets of interfaces for the same functions hence the error. I am not sure if it portable to use PETSc with the </span><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px">F08 Fortran modules in the same program
or routine.</span>
<div><font face="Calibri, Arial, Helvetica, sans-serif" size="3"><br>
</font></div>
<div><font face="Calibri, Arial, Helvetica, sans-serif" size="3"><br>
</font>
<div><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px"><br>
</span></div>
<div><br>
<div><br>
<blockquote type="cite">
<div>On Aug 24, 2023, at 12:22 PM, Vanella, Marcos (Fed) via petsc-users <petsc-users@mcs.anl.gov> wrote:</div>
<br class="x_Apple-interchange-newline">
<div>
<div class="x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You might have an idea on what is going on here. </div>
<div class="x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
These are my modules:</div>
<div class="x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Currently Loaded Modules:
<div class="x_ContentPasted0"> 1) lsf-tools/2.0 3) darshan-runtime/3.4.0-lite 5) DefApps <span class="x_Apple-converted-space"> </span><b>7) spectrum-mpi/10.4.0.3-20210112</b><span class="x_Apple-converted-space"> </span> 9) nsight-systems/2021.3.1.54</div>
2) hsi/5.0.2.p5 4) xalt/1.2.1 <b><span class="x_Apple-converted-space"> </span>6) nvhpc/22.11<span class="x_Apple-converted-space"> </span></b> 8) nsight-compute/2021.2.1 <span class="x_Apple-converted-space"> </span><b>10) cuda/11.7.1</b></div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
I configured and compiled petsc with these options:</div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
without issues. The MPI checks did not go through as this was done in the login node.</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1 x_ContentPasted3 x_ContentPasted4" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Then, I started getting (similarly to what I saw with pgi and gcc in summit) ambiguous interface errors related to mpi routines. I was able to make a simple piece of code that reproduces this. It has to do with having a<span class="x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">USE
PETSC</span><span class="x_Apple-converted-space"> </span>statement in a module (<span style="font-family:"Courier New",monospace">TEST_MOD</span>) and a<span class="x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">USE MPI_F08</span><span class="x_Apple-converted-space"> </span>on
the main program (<span style="font-family:"Courier New",monospace">MAIN</span>) using that module, even though the<span class="x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">PRIVATE</span><span class="x_Apple-converted-space"> </span>statement
has been used in said (<span style="font-family:"Courier New",monospace">TEST_MOD</span>) module.</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1 x_ContentPasted2" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<span style="font-family:"Courier New",monospace"><b>MODULE TEST_MOD</b></span>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">! In this module we use PETSC.</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>USE PETSC</b></span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">!USE MPI</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">IMPLICIT NONE</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>PRIVATE</b></span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>PUBLIC :: TEST1</b></span></div>
<div><br class="x_ContentPasted2">
</div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">CONTAINS</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">SUBROUTINE TEST1(A)</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">IMPLICIT NONE</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">REAL, INTENT(INOUT) :: A</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER :: IERR</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">A=0.</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">ENDSUBROUTINE TEST1</span></div>
<div><br class="x_ContentPasted2">
</div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>ENDMODULE TEST_MOD</b></span></div>
<div><br class="x_ContentPasted2">
</div>
<div><br class="x_ContentPasted2">
</div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>PROGRAM MAIN</b></span></div>
<div><br class="x_ContentPasted2">
</div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">! Assume in main we use some MPI_F08 features.</span></div>
<div class="x_ContentPasted2"><b><span style="font-family:"Courier New",monospace">USE MPI_F08</span></b></div>
<div class="x_ContentPasted2"><b><span style="font-family:"Courier New",monospace">USE TEST_MOD, ONLY : TEST1</span></b></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">IMPLICIT NONE</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER :: MY_RANK,IERR=0</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER :: PNAMELEN=0</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER :: PROVIDED</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">REAL :: A=0.</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR)</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR)</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">CALL TEST1(A)</span></div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace">CALL MPI_FINALIZE(IERR)</span></div>
<div><br class="x_ContentPasted2">
</div>
<div class="x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>ENDPROGRAM MAIN</b></span></div>
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to compile this code:</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1 x_ContentPasted5" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<span style="font-family:"Courier New",monospace">vanellam@login5 test_spectrum_issue $ mpifort -c -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include" mpitest.f90</span>
<div class="x_ContentPasted5"><span style="font-family:"Courier New",monospace">NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread (mpitest.f90: 34)</span></div>
<div class="x_ContentPasted5"><span style="font-family:"Courier New",monospace">NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize (mpitest.f90: 37)</span></div>
<span style="font-family:"Courier New",monospace"> 0 inform, 0 warnings, 2 severes, 0 fatal for main</span><br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1 x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Now, if I change<span class="x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">USE PETSC</span><span class="x_Apple-converted-space"> </span>by<span class="x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">USE
MPI</span><span class="x_Apple-converted-space"> </span>in the module TEST_MOD compilation proceeds correctly. If I leave the USE PETSC statement in the module and change to USE MPI the statement in main compilation also goes through. So it seems to be something
related to using the PETSC and MPI_F08 modules. My take is that it is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc with openmpi in other systems.</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1 x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1 x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Well please let me know if you have any ideas on what might be going on. I'll move to polaris and try with mpich too.</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1 x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1 x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Thanks!</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1 x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Marcos<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0 x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div id="x_appendonsend" style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
</div>
<hr tabindex="-1" style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; display:inline-block; width:934.90625px">
<span style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; float:none; display:inline!important"></span>
<div id="x_divRplyFwdMsg" dir="ltr" style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
<font face="Calibri, sans-serif" style="font-size:11pt"><b>From:</b><span class="x_Apple-converted-space"> </span>Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b><span class="x_Apple-converted-space"> </span>Tuesday, August 22, 2023 5:25 PM<br>
<b>To:</b><span class="x_Apple-converted-space"> </span>Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>><br>
<b>Cc:</b><span class="x_Apple-converted-space"> </span>Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov">marcos.vanella@nist.gov</a>>; PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>>; Guan, Collin X. (Fed)
<<a href="mailto:collin.guan@nist.gov">collin.guan@nist.gov</a>><br>
<b>Subject:</b><span class="x_Apple-converted-space"> </span>Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
<div dir="ltr">Macros,
<div> yes, refer to the example script Matt mentioned for Summit. Feel free to turn on/off options in the file. In my experience, gcc is easier to use.</div>
<div> Also, I found<span class="x_Apple-converted-space"> </span><a href="https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus" originalsrc="https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus" shash="wnoQkfgj9+OKZee+oQirzFDvtlrziJJDltmsQ6RVqtjE+/ngVL38E2aD2P69xqi67dh54p1K9NgbwyXA0lWLDyT70Y38qX7S2Tq5iC8xReUx4gLS462bZ9fMIqimxVSVOTj3jKrSgSg8NJd1x1Izc1C8aEcFRj7WHPhgvctECHM=" originalsrc="https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus" shash="YDb2w7GR/b8GRa5qG3TGGvELJPfEjTmFRd6++Cw1VBEa5bcRLTP5/Iy7mzchyomeSxjKm1tDc23A8bd96A7LbfYT9pslp3lQpG5MVghU4qZB08QDffQGP+kgjQiA+1y2nPnR3a50wABljQmf0jvZ9NGOqBB7KtztBktIs/Nw9oA=">https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus</a>,
which might be similar to your machine (4 GPUs per node). The key point is:<span class="x_Apple-converted-space"> </span><span style="font-family:proxima-nova,sans-serif; font-size:19px">The Cray MPI on Polaris does not currently support binding MPI ranks
to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set </span>CUDA_VISIBLE_DEVICES<span class="x_Apple-converted-space"> </span><span style="font-family:proxima-nova,sans-serif; font-size:19px">for
each MPI rank.</span></div>
<div> So you can try the helper script <span style="background-color:rgb(245,245,245); color:rgb(54,70,78); font-family:"Roboto Mono",SFMono-Regular,Consolas,Menlo,monospace; font-size:16.15px; font-variant-ligatures:none">set_affinity_gpu_polaris.sh </span>to
manually set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and then run your job with</div>
<div> <span style="font-family:"Courier New",monospace; font-size:16px">srun -N 2 -n 16 </span><span style="background-color:rgb(245,245,245); color:rgb(54,70,78); font-family:"Roboto Mono",SFMono-Regular,Consolas,Menlo,monospace; font-size:16.15px; font-variant-ligatures:none">set_affinity_gpu_polaris.sh </span><span style="font-family:"Courier New",monospace; font-size:16px">/home/mnv/Firemodels_fork/fds/</span><span style="font-family:"Courier New",monospace; font-size:16px">Build/ompi_gnu_linux/fds_ompi_</span><span style="font-family:"Courier New",monospace; font-size:16px">gnu_linux
test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda</span><br>
</div>
<div><br>
</div>
<div> Then, check again with <span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px">nvidia-smi to see if GPU memory is evenly allocated.</span></div>
<div>
<div>
<div dir="ltr" class="x_x_gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="x_x_gmail_quote">
<div dir="ltr" class="x_x_gmail_attr">On Tue, Aug 22, 2023 at 3:03 PM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br>
</div>
<blockquote class="x_x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div dir="ltr">On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br>
</div>
<div class="x_x_gmail_quote">
<blockquote class="x_x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Hi Junchao, both the slurm<span class="x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">scontrol show job_id -dd</span><span class="x_Apple-converted-space"> </span>and
looking at<span class="x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">CUDA_VISIBLE_DEVICES</span><span class="x_Apple-converted-space"> </span>does not provide information about which MPI process is associated to which GPU
in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc,
pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). </div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>The PETSc configure examples are in the repository:</div>
<div><br>
</div>
<div> <a href="https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads" originalsrc="https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads" shash="zqCo6I9GbrKqgPOIVRV+CrIH0oKKTe3iH9yqycB+aw/Y4P1sOVKs7fX+f4iLCDjaSWgj0l4OdwB/MlJSIl+l96Ui7Fyvo8nDZZ85BijO8IQFDCVia13XJDkgxav/75CdAq+JLtqI/+fOMQ2pjCLrxoWSl8FYsoD+xsHBlmvSyQo=" originalsrc="https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads" shash="bR9YFFITIllVF18//ULuI26CPHbXLW6V1h6wLU0Wm2UnKlmy3mayYoW0Wa8pfjor8MrOuD+WwVwdRgNab3b5r5idNo26hu2++9bwbr4IbPsZl4uS1pDS5DAoFYXpzbjbZeJWKtat0ngpjkfg1aLGjbkBJ7VYVVeu11YRsZLpYRo=" target="_blank">https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads</a></div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="x_x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Thanks!<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">I configured the library --with-cuda and when compiling I get a compilation error with CUDAC:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><span style="font-family:"Courier New",monospace">CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o</span>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:1/" originalsrc="http://curand2.cu:1/" shash="RWeRqk/MxTwDKSFxz1q1HhOdbQtqlprzSPCpNUxU7RM/JFiRI752G1CQD2b/Acu7kRkI4rBm54cJuCUd4xYkjVNEyKVimJNCoMpCeG7pBsGxVEV9ZCmVCx/ti7v5K9TqdYuXYqN8ArqnAY1SEAPk9mT48Q5Vn8vUZ/CG27Fv9gs=" originalsrc="http://curand2.cu:1/" shash="yxi+KmkrfZwJer3K0IpA3tU0Y2TJmCfhri9VDbUJXkOlvp4rVsRJT3Iaph4huhjPkb5EJ18xYq7Qv5hA4dqoOGzz/gKPIXKv2X9bRPv9FDim85Dmxnn755ZXa3dyXJEnbYa76IAL5+HqW/ITY/vobHlykoHRCdUceFORGHeutLQ=" target="_blank">curand2.cu:1</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace"> THRUST_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace"> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace"># define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace"># define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:141:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:2/" originalsrc="http://curand2.cu:2/" shash="OtzZ7MAlihf1onYHuA9x0XI70jI3zN3Qv6afKnGG4k2jWhTZSQDW6cKny+4C0+GFFFI/TXtiyLilpIyi1zrg5qQxpcMFlEDtCzgYjh2hDvqh9XIDOHbAA0CjC0G97FQO5b8VLkxAco7AzYgnODvRxcpx1/y7h3j547+boMsojw4=" originalsrc="http://curand2.cu:2/" shash="Y97Lty5BXfhF9Rmc2k3UjJd6be+H7DuBawWQP3Bzz6skVNK76wzMh7AX1h46LYhYXHDzPklXCeV2oQp7FUpeL/OanP6yhQogapGMu4dADA8Nk/78cdHCr2E+GJNEAj6qlprBcWcYOLA/ZBNmLTsgUazvQKQ+dFELaZ72PBdXTyE=" target="_blank">curand2.cu:2</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace"> CUB_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace"> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace"># define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace"># define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:198:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:1/" originalsrc="http://curand2.cu:1/" shash="belksVOZ0kDfCP/VnKhSUDO61uwI33Zz0GB/3bGxMM4iuB+U2GriuCFySgJ75iAtP84LC7RRPD8QIN8X2sVzG+/3sRrdxv2bGxs7Wmdu+iQVrycHdvZ3clZlVb3zUmccLNzTKQUJ2ZeR2naFLpmiFOQqx+a4HfnGWF1UjjlVDU8=" originalsrc="http://curand2.cu:1/" shash="yxi+KmkrfZwJer3K0IpA3tU0Y2TJmCfhri9VDbUJXkOlvp4rVsRJT3Iaph4huhjPkb5EJ18xYq7Qv5hA4dqoOGzz/gKPIXKv2X9bRPv9FDim85Dmxnn755ZXa3dyXJEnbYa76IAL5+HqW/ITY/vobHlykoHRCdUceFORGHeutLQ=" target="_blank">curand2.cu:1</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace"> THRUST_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace"> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace"># define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace"># define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:149:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:2/" originalsrc="http://curand2.cu:2/" shash="OtzZ7MAlihf1onYHuA9x0XI70jI3zN3Qv6afKnGG4k2jWhTZSQDW6cKny+4C0+GFFFI/TXtiyLilpIyi1zrg5qQxpcMFlEDtCzgYjh2hDvqh9XIDOHbAA0CjC0G97FQO5b8VLkxAco7AzYgnODvRxcpx1/y7h3j547+boMsojw4=" originalsrc="http://curand2.cu:2/" shash="Y97Lty5BXfhF9Rmc2k3UjJd6be+H7DuBawWQP3Bzz6skVNK76wzMh7AX1h46LYhYXHDzPklXCeV2oQp7FUpeL/OanP6yhQogapGMu4dADA8Nk/78cdHCr2E+GJNEAj6qlprBcWcYOLA/ZBNmLTsgUazvQKQ+dFELaZ72PBdXTyE=" target="_blank">curand2.cu:2</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace"> CUB_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace"> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace"># define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace"># define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:208:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(len);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(t);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(s);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(flg);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(n);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(s);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(n);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(t);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(b);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(b);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(tmp);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(haystack);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(needle);</span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(tmp);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(t);<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace"> ^</span></div>
<div><span style="font-family:"Courier New",monospace">fatal error: too many errors emitted, stopping now [-ferror-limit=]</span></div>
<div><span style="font-family:"Courier New",monospace">20 errors generated.</span></div>
<div><span style="font-family:"Courier New",monospace">Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp.</span></div>
<div><span style="font-family:"Courier New",monospace">gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1</span></div>
<div><span style="font-family:"Courier New",monospace">gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2</span></div>
<div><span style="font-family:"Courier New",monospace">**************************ERROR*************************************</span></div>
<div><span style="font-family:"Courier New",monospace"> Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log</span></div>
<div><span style="font-family:"Courier New",monospace"> Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to<span class="x_Apple-converted-space"> </span><a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a></span></div>
<div><span style="font-family:"Courier New",monospace">********************************************************************</span></div>
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"> <br>
</div>
<div id="x_x_m_1321177721242751015m_3108646833317763144appendonsend"></div>
<hr style="display:inline-block; width:899.78125px">
<div id="x_x_m_1321177721242751015m_3108646833317763144divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt"><b>From:</b><span class="x_Apple-converted-space"> </span>Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b><span class="x_Apple-converted-space"> </span>Monday, August 21, 2023 4:17 PM<br>
<b>To:</b><span class="x_Apple-converted-space"> </span>Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b><span class="x_Apple-converted-space"> </span>PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Guan, Collin X. (Fed) <<a href="mailto:collin.guan@nist.gov" target="_blank">collin.guan@nist.gov</a>><br>
<b>Subject:</b><span class="x_Apple-converted-space"> </span>Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">That is a good question. Looking at <a href="https://slurm.schedmd.com/gres.html#GPU_Management" originalsrc="https://slurm.schedmd.com/gres.html#GPU_Management" shash="NpNjP+vVqNyrBYTj3iFaq9wFUIQ/CBjAx0XzhiDFqkFgjbX4AFcv0Sq1VITb6akyl3Zi6zRJ0lXixteFlEnydkzEgWq4pLaHaJAtTcSSp3i+oug0j+tVK1xZkr9162YkM+eWR+pFcn2qtF8JxjNTqFA/orCjfzpBCvDFIOj/e5U=" originalsrc="https://slurm.schedmd.com/gres.html#GPU_Management" shash="LC4X7xtlmvARi5zTjJ4hScMweqJf371IEP6LgQ7zRYaVaPKsh+P4Y3DP5xb6KWnd/gs+aCRK01upqOf69dmdDrRVBjLjokbqZCMdwjXEUP7dVuR1kOFBLbeQ2JnZA3Iq1OAg8t/30AlwdHt3htHl3IphONBsiMyJu+rDg082OuI=" target="_blank">https://slurm.schedmd.com/gres.html#GPU_Management</a>,
I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated.
<div><br>
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div>
<div dir="ltr">On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">It says in the script it has allocated 2.4GB</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Best,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Marcos<br>
</div>
<div id="x_x_m_1321177721242751015m_3108646833317763144x_m_3869060330462788085appendonsend">
</div>
<hr style="display:inline-block; width:882.21875px">
<div id="x_x_m_1321177721242751015m_3108646833317763144x_m_3869060330462788085divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" style="font-size:11pt"><b>From:</b><span class="x_Apple-converted-space"> </span>Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b><span class="x_Apple-converted-space"> </span>Monday, August 21, 2023 3:29 PM<br>
<b>To:</b><span class="x_Apple-converted-space"> </span>Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b><span class="x_Apple-converted-space"> </span>PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Guan, Collin X. (Fed) <<a href="mailto:collin.guan@nist.gov" target="_blank">collin.guan@nist.gov</a>><br>
<b>Subject:</b><span class="x_Apple-converted-space"> </span>Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi, Macros,
<div> If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node.</div>
<div> The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good.<br>
<div><br>
</div>
</div>
<div> Thanks.</div>
</div>
<br>
<div>
<div dir="ltr">On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking
what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><span style="font-family:"Courier New",monospace">Mon Aug 21 14:36:07 2023 <span class="x_Apple-converted-space"> </span></span>
<div><span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |</span></div>
<div><span style="font-family:"Courier New",monospace">|-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |</span></div>
<div><span style="font-family:"Courier New",monospace">| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | MIG M. |</span></div>
<div><span style="font-family:"Courier New",monospace">|=========================================+======================+======================|</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default |</span></div>
<div><span style="font-family:"Courier New",monospace">| | | N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace"> </span></div>
<div><span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| Processes: |</span></div>
<div><span style="font-family:"Courier New",monospace">| GPU GI CI PID Type Process name GPU Memory |</span></div>
<div><span style="font-family:"Courier New",monospace">| ID ID Usage |</span></div>
<div><span style="font-family:"Courier New",monospace">|=======================================================================================|</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">| 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB |</span></div>
<span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected
or there are some changes I need to do on my submission script/runtime parameters.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node):</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<div><span style="font-family:"Courier New",monospace">#!/bin/bash</span></div>
<div><span style="font-family:"Courier New",monospace"># ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -J test<span class="x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --partition=gpu</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks=16</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks-per-node=8</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --cpus-per-task=1</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --nodes=2</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --time=01:00:00</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --gres=gpu:4</span></div>
<br>
<div><span style="font-family:"Courier New",monospace">export OMP_NUM_THREADS=1</span></div>
<div><span style="font-family:"Courier New",monospace"># modules</span></div>
<div><span style="font-family:"Courier New",monospace">module load cuda/11.7</span></div>
<div><span style="font-family:"Courier New",monospace">module load gcc/11.2.1/toolset</span></div>
<div><span style="font-family:"Courier New",monospace">module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">cd /home/mnv/Firemodels_fork/fds/Issues/PETSc</span></div>
<div><br>
</div>
<div></div>
<span style="font-family:"Courier New",monospace">srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda</span>
<div></div>
<span style="font-family:"Courier New",monospace"> </span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Thank you for the advice,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Marcos<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"> <br>
</div>
<div id="x_x_m_1321177721242751015m_3108646833317763144x_m_3869060330462788085x_m_-2525567993800845248appendonsend">
</div>
<br>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
<span class="x_x_gmail_signature_prefix">--<span class="x_Apple-converted-space"> </span></span><br>
<div dir="ltr" class="x_x_gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/" originalsrc="http://www.cse.buffalo.edu/~knepley/" shash="NHB9PooWsdCdWM4zFeDQwmH1aBBtOsWDBKs6H2NzDWE1l/jXRA08JkMGMwcCWykV3t2sUMNGLCNETZ2HHQdAfYHkqwZVxUoyXsV+fyBW3v1aJ/HUycp7786udQxTjxuIuRJW1w7B/+D4LTbGNh6hlinDhoTNlREJvpITxjty62g=" originalsrc="http://www.cse.buffalo.edu/~knepley/" shash="FwFRWHp/X/Z9pBT8MTJCBIgG9+MWLt9bWDZ8l/Gw7ysJKEU3WVMDKQh2PQeyBkNpSUthtg3jMCFVL68EIiHftGeR/ocxqISnxGUec1NAZRd4QK7HMH9RCEFfHV+BZ6tgNew/QAgon8uWS/u2DYQhZM82HTyAoYg0mdRob0cyMJI=" target="_blank">https://www.cse.buffalo.edu/~knepley/</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</body>
</html>