<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Hi Barry, an update on this. I reverted to using MPI instead of MPI_F08 and could compile the code in summit with nvhpc and spectrum-mpi. I also moved to polaris, compiled petsc with gcc, cray-mpich and cuda and was able to compile FDS + PETSc right off the
 bat without any source changes (i.e. USE PETSC and USE MPI_F08 were defined in the source).</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Mi impression is that there is an underlying issue with spectrum-mpi. </div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
I will make tests with both combinations PETSc + USE MPI and PETSc + USE MPI_F08, both with GPU to see if the fact that we are mixing MPI fortran versions has an effect in hardware use, timings.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Thanks,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Marcos<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Barry Smith <bsmith@petsc.dev><br>
<b>Sent:</b> Thursday, August 24, 2023 3:07 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <marcos.vanella@nist.gov><br>
<b>Cc:</b> PETSc users list <petsc-users@mcs.anl.gov>; Guan, Collin X. (Fed) <collin.guan@nist.gov><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div style="line-break:after-white-space"><br>
<div><br>
<blockquote type="cite">
<div>On Aug 24, 2023, at 2:00 PM, Vanella, Marcos (Fed) <marcos.vanella@nist.gov> wrote:</div>
<br class="x_Apple-interchange-newline">
<div>
<div class="x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Thank you Barry, I will dial back the MPI_F08 use in our source code and try compiling it. I haven't found much information regarding using MPI and MPI_F08 in different modules other than the following link from several years ago:</div>
<div class="x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<a href="https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5" originalsrc="https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5" shash="nBOsA7cpZXzwKAwTNVS41Z8dgfXX5cfVpPkcb4qfzZGVq4vp+/t6K81ZXObGRfxWvPpNlGWoWxwl5TchSd3LBBBwjys6suIBI2A8SWXqFHxQGK42x6rOr5aM3bRExRjItNWtqZ7z9Rc2GXxtyNgV+yhl5q8Hs7YX4i8omCoIpAU=" id="LPlnk450070">https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5</a></div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Looks like this has been fixed for openmpi and newer gfortran versions because I don't have issues with this MPI lib/compiler combination. Same with openmpi/ifort.<br>
</div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
What I find quite interesting is: I assumed the PRIVATE statement in a module should provide a backstop on the access propagation of variables not explicitly stated in the PUBLIC statement in a module, including the ones that belong to other modules upstream
 visible through USE. This does not seem to be the case here.</div>
</div>
</blockquote>
<div><br>
</div>
   I agree, you had seemingly inconsistent results with your different tests; it could be bugs in the handling of modules by the Fortran system.</div>
<div><br>
</div>
<div><br>
<blockquote type="cite">
<div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Best,</div>
<div class="x_elementToProof x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Marcos<br>
</div>
<div class="x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
 <br>
</div>
<div id="x_appendonsend" style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
</div>
<hr tabindex="-1" style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; display:inline-block; width:934.90625px">
<span style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; float:none; display:inline!important"></span>
<div id="x_divRplyFwdMsg" dir="ltr" style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
<font face="Calibri, sans-serif" style="font-size:11pt"><b>From:</b><span class="x_Apple-converted-space"> </span>Barry Smith <<a href="mailto:bsmith@petsc.dev">bsmith@petsc.dev</a>><br>
<b>Sent:</b><span class="x_Apple-converted-space"> </span>Thursday, August 24, 2023 12:40 PM<br>
<b>To:</b><span class="x_Apple-converted-space"> </span>Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b><span class="x_Apple-converted-space"> </span>PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>>; Guan, Collin X. (Fed) <<a href="mailto:collin.guan@nist.gov">collin.guan@nist.gov</a>><br>
<b>Subject:</b><span class="x_Apple-converted-space"> </span>Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; line-break:after-white-space">
<div><br>
</div>
   PETSc uses the non-<span style="font-size:16px; font-family:Calibri,Arial,Helvetica,sans-serif">MPI_F08 Fortran modules so I am guessing when you also use the </span><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px">MPI_F08 modules
 the compiler sees two sets of interfaces for the same functions hence the error.  I am not sure if it portable to use PETSc with the </span><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px">F08 Fortran modules  in the same program
 or routine.</span>
<div><font face="Calibri, Arial, Helvetica, sans-serif" size="3"><br>
</font></div>
<div><font face="Calibri, Arial, Helvetica, sans-serif" size="3"><br>
</font>
<div><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px"><br>
</span></div>
<div><br>
<div><br>
<blockquote type="cite">
<div>On Aug 24, 2023, at 12:22 PM, Vanella, Marcos (Fed) via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>> wrote:</div>
<br class="x_x_Apple-interchange-newline">
<div>
<div class="x_x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You might have an idea on what is going on here. </div>
<div class="x_x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
These are my modules:</div>
<div class="x_x_elementToProof" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Currently Loaded Modules:
<div class="x_x_ContentPasted0">  1) lsf-tools/2.0   3) darshan-runtime/3.4.0-lite   5) DefApps      <span class="x_x_Apple-converted-space"> </span><b>7) spectrum-mpi/10.4.0.3-20210112</b><span class="x_x_Apple-converted-space"> </span>  9) nsight-systems/2021.3.1.54</div>
  2) hsi/5.0.2.p5    4) xalt/1.2.1                  <b><span class="x_x_Apple-converted-space"> </span>6) nvhpc/22.11<span class="x_x_Apple-converted-space"> </span></b>  8) nsight-compute/2021.2.1        <span class="x_x_Apple-converted-space"> </span><b>10)
 cuda/11.7.1</b></div>
<div class="x_x_elementToProof x_x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
I configured and compiled petsc with these options:</div>
<div class="x_x_elementToProof x_x_ContentPasted0" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
without issues. The MPI checks did not go through as this was done in the login node.</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1 x_x_ContentPasted3 x_x_ContentPasted4" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Then, I started getting (similarly to what I saw with pgi and gcc in summit) ambiguous interface errors related to mpi routines. I was able to make a simple piece of code that reproduces this. It has to do with having a<span class="x_x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">USE
 PETSC</span><span class="x_x_Apple-converted-space"> </span>statement in a module (<span style="font-family:"Courier New",monospace">TEST_MOD</span>) and a<span class="x_x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">USE
 MPI_F08</span><span class="x_x_Apple-converted-space"> </span>on the main program (<span style="font-family:"Courier New",monospace">MAIN</span>) using that module, even though the<span class="x_x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">PRIVATE</span><span class="x_x_Apple-converted-space"> </span>statement
 has been used in said (<span style="font-family:"Courier New",monospace">TEST_MOD</span>) module.</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1 x_x_ContentPasted2" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<span style="font-family:"Courier New",monospace"><b>MODULE TEST_MOD</b></span>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">! In this module we use PETSC.</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>USE PETSC</b></span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">!USE MPI</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">IMPLICIT NONE</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>PRIVATE</b></span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>PUBLIC :: TEST1</b></span></div>
<div><br class="x_x_ContentPasted2">
</div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">CONTAINS</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">SUBROUTINE TEST1(A)</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">IMPLICIT NONE</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">REAL, INTENT(INOUT) :: A</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER :: IERR</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">A=0.</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">ENDSUBROUTINE TEST1</span></div>
<div><br class="x_x_ContentPasted2">
</div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>ENDMODULE TEST_MOD</b></span></div>
<div><br class="x_x_ContentPasted2">
</div>
<div><br class="x_x_ContentPasted2">
</div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>PROGRAM MAIN</b></span></div>
<div><br class="x_x_ContentPasted2">
</div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">! Assume in main we use some MPI_F08 features.</span></div>
<div class="x_x_ContentPasted2"><b><span style="font-family:"Courier New",monospace">USE MPI_F08</span></b></div>
<div class="x_x_ContentPasted2"><b><span style="font-family:"Courier New",monospace">USE TEST_MOD, ONLY : TEST1</span></b></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">IMPLICIT NONE</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER :: MY_RANK,IERR=0</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER :: PNAMELEN=0</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER :: PROVIDED</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">REAL :: A=0.</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR)</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR)</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">CALL TEST1(A)</span></div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace">CALL MPI_FINALIZE(IERR)</span></div>
<div><br class="x_x_ContentPasted2">
</div>
<div class="x_x_ContentPasted2"><span style="font-family:"Courier New",monospace"><b>ENDPROGRAM MAIN</b></span></div>
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to compile this code:</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1 x_x_ContentPasted5" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<span style="font-family:"Courier New",monospace">vanellam@login5 test_spectrum_issue $ mpifort -c -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include"  mpitest.f90</span>
<div class="x_x_ContentPasted5"><span style="font-family:"Courier New",monospace">NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread (mpitest.f90: 34)</span></div>
<div class="x_x_ContentPasted5"><span style="font-family:"Courier New",monospace">NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize (mpitest.f90: 37)</span></div>
<span style="font-family:"Courier New",monospace">  0 inform,   0 warnings,   2 severes, 0 fatal for main</span><br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1 x_x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Now, if I change<span class="x_x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">USE PETSC</span><span class="x_x_Apple-converted-space"> </span>by<span class="x_x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">USE
 MPI</span><span class="x_x_Apple-converted-space"> </span>in the module TEST_MOD compilation proceeds correctly. If I leave the USE PETSC statement in the module and change to USE MPI the statement in main compilation also goes through. So it seems to be something
 related to using the PETSC and MPI_F08 modules. My take is that it is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc with openmpi in other systems.</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1 x_x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1 x_x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Well please let me know if you have any ideas on what might be going on. I'll move to polaris and try with mpich too.</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1 x_x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1 x_x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Thanks!</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1 x_x_ContentPasted6" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
Marcos<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div class="x_x_elementToProof x_x_ContentPasted0 x_x_ContentPasted1" style="font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<br>
</div>
<div id="x_x_appendonsend" style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
</div>
<hr tabindex="-1" style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; display:inline-block; width:934.90625px">
<span style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; float:none; display:inline!important"></span>
<div id="x_x_divRplyFwdMsg" dir="ltr" style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
<font face="Calibri, sans-serif" style="font-size:11pt"><b>From:</b><span class="x_x_Apple-converted-space"> </span>Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b><span class="x_x_Apple-converted-space"> </span>Tuesday, August 22, 2023 5:25 PM<br>
<b>To:</b><span class="x_x_Apple-converted-space"> </span>Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>><br>
<b>Cc:</b><span class="x_x_Apple-converted-space"> </span>Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov">marcos.vanella@nist.gov</a>>; PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>>; Guan, Collin X.
 (Fed) <<a href="mailto:collin.guan@nist.gov">collin.guan@nist.gov</a>><br>
<b>Subject:</b><span class="x_x_Apple-converted-space"> </span>Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div style="font-family:Helvetica; font-size:18px; font-style:normal; font-variant-caps:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
<div dir="ltr">Macros,
<div>  yes, refer to the example script Matt mentioned for Summit.  Feel free to turn on/off options in the file.  In my experience, gcc is easier to use.</div>
<div>  Also, I found<span class="x_x_Apple-converted-space"> </span><a href="https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus" originalsrc="https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus" shash="RAlG1feRaYlnLPuvvkT2kXOF57bvJosCEr8CkU16lh2EHL4aV2WmwY03G3zHhxFqX6BPBsrHSd/csSNxlI86N2RTow4Qup8M8EHx1XPomdXD38t3RBCkTyiufQZy7+8NaBl5AjMxbCdOAnNrBbuyDJGDTCVMHEA1TyIpwIHOnvY=" originalsrc="https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus" shash="wnoQkfgj9+OKZee+oQirzFDvtlrziJJDltmsQ6RVqtjE+/ngVL38E2aD2P69xqi67dh54p1K9NgbwyXA0lWLDyT70Y38qX7S2Tq5iC8xReUx4gLS462bZ9fMIqimxVSVOTj3jKrSgSg8NJd1x1Izc1C8aEcFRj7WHPhgvctECHM=">https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus</a>,
 which might be similar to your machine (4 GPUs per node).  The key point is:<span class="x_x_Apple-converted-space"> </span><span style="font-family:proxima-nova,sans-serif; font-size:19px">The Cray MPI on Polaris does not currently support binding MPI ranks
 to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set </span>CUDA_VISIBLE_DEVICES<span class="x_x_Apple-converted-space"> </span><span style="font-family:proxima-nova,sans-serif; font-size:19px">for
 each MPI rank.</span></div>
<div>  So you can try the helper script <span style="background-color:rgb(245,245,245); color:rgb(54,70,78); font-family:"Roboto Mono",SFMono-Regular,Consolas,Menlo,monospace; font-size:16.15px; font-variant-ligatures:none">set_affinity_gpu_polaris.sh </span>to
 manually set  CUDA_VISIBLE_DEVICES.  In other words, make the script on your PATH and then run your job with</div>
<div>      <span style="font-family:"Courier New",monospace; font-size:16px">srun -N 2 -n 16 </span><span style="background-color:rgb(245,245,245); color:rgb(54,70,78); font-family:"Roboto Mono",SFMono-Regular,Consolas,Menlo,monospace; font-size:16.15px; font-variant-ligatures:none">set_affinity_gpu_polaris.sh </span><span style="font-family:"Courier New",monospace; font-size:16px">/home/mnv/Firemodels_fork/fds/</span><span style="font-family:"Courier New",monospace; font-size:16px">Build/ompi_gnu_linux/fds_ompi_</span><span style="font-family:"Courier New",monospace; font-size:16px">gnu_linux
 test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda</span><br>
</div>
<div><br>
</div>
<div>  Then, check again with <span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px">nvidia-smi to see if GPU memory is evenly allocated.</span></div>
<div>
<div>
<div dir="ltr" class="x_x_x_gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="x_x_x_gmail_quote">
<div dir="ltr" class="x_x_x_gmail_attr">On Tue, Aug 22, 2023 at 3:03 PM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br>
</div>
<blockquote class="x_x_x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div dir="ltr">On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br>
</div>
<div class="x_x_x_gmail_quote">
<blockquote class="x_x_x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Hi Junchao, both the slurm<span class="x_x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">scontrol show job_id -dd</span><span class="x_x_Apple-converted-space"> </span>and
 looking at<span class="x_x_Apple-converted-space"> </span><span style="font-family:"Courier New",monospace">CUDA_VISIBLE_DEVICES</span><span class="x_x_Apple-converted-space"> </span>does not provide information about which MPI process is associated to which
 GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc,
 pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). </div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>The PETSc configure examples are in the repository:</div>
<div><br>
</div>
<div>   <a href="https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads" originalsrc="https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads" shash="ot/5pqKpN8/OsfBqbWfXaLEqg2peuvKLuE7/YJr2wlbVTvdH86BHQJOrxjSk3SuwYwEzb+HEcKqwBIZ7pkUE2/7KzPsKyqQDzyAJHzQplqlh+QUENuWDDM7EhMHScx1CWwFF79MsEIHgk1euOZ9jnbMnuCZYOGiRkcLnCBGTnKc=" originalsrc="https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads" shash="zqCo6I9GbrKqgPOIVRV+CrIH0oKKTe3iH9yqycB+aw/Y4P1sOVKs7fX+f4iLCDjaSWgj0l4OdwB/MlJSIl+l96Ui7Fyvo8nDZZ85BijO8IQFDCVia13XJDkgxav/75CdAq+JLtqI/+fOMQ2pjCLrxoWSl8FYsoD+xsHBlmvSyQo=" target="_blank">https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads</a></div>
<div><br>
</div>
<div>    Thanks,</div>
<div><br>
</div>
<div>      Matt</div>
<div> </div>
<blockquote class="x_x_x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Thanks!<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">I configured the library --with-cuda and when compiling I get a compilation error with CUDAC:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><span style="font-family:"Courier New",monospace">CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o</span>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:1/" originalsrc="http://curand2.cu:1/" shash="Ft/wjviOIUoBTKYIfAhtKJI7vinsMQHpoXIY+QEOhH10AhK8+1TbYxb8GEdDga7GJPdzKgTNEqcQXEdSHuA3FHpj09Lxnq+pcmaw1pBOgleIdTLcX2A1feNIG4NyvDPWujvN9BP89+dVRN8hx4RbTV6StD4tycHT5STHXA/9yfs=" originalsrc="http://curand2.cu:1/" shash="RWeRqk/MxTwDKSFxz1q1HhOdbQtqlprzSPCpNUxU7RM/JFiRI752G1CQD2b/Acu7kRkI4rBm54cJuCUd4xYkjVNEyKVimJNCoMpCeG7pBsGxVEV9ZCmVCx/ti7v5K9TqdYuXYqN8ArqnAY1SEAPk9mT48Q5Vn8vUZ/CG27Fv9gs=" target="_blank">curand2.cu:1</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace">     THRUST_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace">     ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace">  THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace">#  define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace">                                     ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace">#  define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace">                                       ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:141:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace">     ^</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:2/" originalsrc="http://curand2.cu:2/" shash="pbBsHDfzg+Q7wETNMlpRfW2eWV+Vf1hbbYweXPnHpSOz7unZ85CYXp67DPPV2F+L6Jx9+4hkLlbx2OsqVlq3oRjyRImbRQq+97gYYLYu9e9xtrb654IyLFw2jwhuuM4fgXyOgnuooOK4xndLJe+9uXNm3Z24seFB3w6mxBRijYg=" originalsrc="http://curand2.cu:2/" shash="OtzZ7MAlihf1onYHuA9x0XI70jI3zN3Qv6afKnGG4k2jWhTZSQDW6cKny+4C0+GFFFI/TXtiyLilpIyi1zrg5qQxpcMFlEDtCzgYjh2hDvqh9XIDOHbAA0CjC0G97FQO5b8VLkxAco7AzYgnODvRxcpx1/y7h3j547+boMsojw4=" target="_blank">curand2.cu:2</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace">     CUB_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace">     ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace">  CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace">#  define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace">                                  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace">#  define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace">                                    ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:198:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace">     ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:1/" originalsrc="http://curand2.cu:1/" shash="Ft/wjviOIUoBTKYIfAhtKJI7vinsMQHpoXIY+QEOhH10AhK8+1TbYxb8GEdDga7GJPdzKgTNEqcQXEdSHuA3FHpj09Lxnq+pcmaw1pBOgleIdTLcX2A1feNIG4NyvDPWujvN9BP89+dVRN8hx4RbTV6StD4tycHT5STHXA/9yfs=" originalsrc="http://curand2.cu:1/" shash="belksVOZ0kDfCP/VnKhSUDO61uwI33Zz0GB/3bGxMM4iuB+U2GriuCFySgJ75iAtP84LC7RRPD8QIN8X2sVzG+/3sRrdxv2bGxs7Wmdu+iQVrycHdvZ3clZlVb3zUmccLNzTKQUJ2ZeR2naFLpmiFOQqx+a4HfnGWF1UjjlVDU8=" target="_blank">curand2.cu:1</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace">     THRUST_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace">     ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace">  THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace">#  define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace">                                     ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace">#  define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace">                                       ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:149:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace">     ^</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/<a href="http://curand2.cu:2/" originalsrc="http://curand2.cu:2/" shash="pbBsHDfzg+Q7wETNMlpRfW2eWV+Vf1hbbYweXPnHpSOz7unZ85CYXp67DPPV2F+L6Jx9+4hkLlbx2OsqVlq3oRjyRImbRQq+97gYYLYu9e9xtrb654IyLFw2jwhuuM4fgXyOgnuooOK4xndLJe+9uXNm3Z24seFB3w6mxBRijYg=" originalsrc="http://curand2.cu:2/" shash="OtzZ7MAlihf1onYHuA9x0XI70jI3zN3Qv6afKnGG4k2jWhTZSQDW6cKny+4C0+GFFFI/TXtiyLilpIyi1zrg5qQxpcMFlEDtCzgYjh2hDvqh9XIDOHbAA0CjC0G97FQO5b8VLkxAco7AzYgnODvRxcpx1/y7h3j547+boMsojw4=" target="_blank">curand2.cu:2</a>:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:</span></div>
<div><span style="font-family:"Courier New",monospace">In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages]</span></div>
<div><span style="font-family:"Courier New",monospace">     CUB_COMPILER_DEPRECATION(Clang 7.0);</span></div>
<div><span style="font-family:"Courier New",monospace">     ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION'</span></div>
<div><span style="font-family:"Courier New",monospace">  CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)</span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL'</span></div>
<div><span style="font-family:"Courier New",monospace">#  define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)</span></div>
<div><span style="font-family:"Courier New",monospace">                                  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0'</span></div>
<div><span style="font-family:"Courier New",monospace">#  define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)</span></div>
<div><span style="font-family:"Courier New",monospace">                                    ^</span></div>
<div><span style="font-family:"Courier New",monospace"><scratch space>:208:6: note: expanded from here</span></div>
<div><span style="font-family:"Courier New",monospace"> GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."</span></div>
<div><span style="font-family:"Courier New",monospace">     ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(len);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(t);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(s);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(flg);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(n);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(s);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(n);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(t);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(b);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(a);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(b);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(tmp);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(haystack);</span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(needle);</span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(tmp);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume'</span></div>
<div><span style="font-family:"Courier New",monospace">; __builtin_assume(t);<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">  ^</span></div>
<div><span style="font-family:"Courier New",monospace">fatal error: too many errors emitted, stopping now [-ferror-limit=]</span></div>
<div><span style="font-family:"Courier New",monospace">20 errors generated.</span></div>
<div><span style="font-family:"Courier New",monospace">Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp.</span></div>
<div><span style="font-family:"Courier New",monospace">gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1</span></div>
<div><span style="font-family:"Courier New",monospace">gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2</span></div>
<div><span style="font-family:"Courier New",monospace">**************************ERROR*************************************</span></div>
<div><span style="font-family:"Courier New",monospace">  Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log</span></div>
<div><span style="font-family:"Courier New",monospace">  Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to<span class="x_x_Apple-converted-space"> </span><a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a></span></div>
<div><span style="font-family:"Courier New",monospace">********************************************************************</span></div>
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"> <br>
</div>
<div id="x_x_x_m_1321177721242751015m_3108646833317763144appendonsend"></div>
<hr style="display:inline-block; width:899.78125px">
<div id="x_x_x_m_1321177721242751015m_3108646833317763144divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" style="font-size:11pt"><b>From:</b><span class="x_x_Apple-converted-space"> </span>Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b><span class="x_x_Apple-converted-space"> </span>Monday, August 21, 2023 4:17 PM<br>
<b>To:</b><span class="x_x_Apple-converted-space"> </span>Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b><span class="x_x_Apple-converted-space"> </span>PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Guan, Collin X. (Fed) <<a href="mailto:collin.guan@nist.gov" target="_blank">collin.guan@nist.gov</a>><br>
<b>Subject:</b><span class="x_x_Apple-converted-space"> </span>Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">That is a good question.  Looking at <a href="https://slurm.schedmd.com/gres.html#GPU_Management" originalsrc="https://slurm.schedmd.com/gres.html#GPU_Management" shash="mR2MCQ005nTemy9nKVli1TiaiBksBiu7jl4PoXeHRO5SY6K9l/HQi2WNh8Gk7fFSfo2VBaG76NGMiwKdwDRKvHG3Lqnee6vgfbHZRMXbmogwduNJEaeFIM7jIlITtWU5kr8Tgl2UvqQDkDB0/78piHK7+W6WCifrZaXNYpm+dlg=" originalsrc="https://slurm.schedmd.com/gres.html#GPU_Management" shash="NpNjP+vVqNyrBYTj3iFaq9wFUIQ/CBjAx0XzhiDFqkFgjbX4AFcv0Sq1VITb6akyl3Zi6zRJ0lXixteFlEnydkzEgWq4pLaHaJAtTcSSp3i+oug0j+tVK1xZkr9162YkM+eWR+pFcn2qtF8JxjNTqFA/orCjfzpBCvDFIOj/e5U=" target="_blank">https://slurm.schedmd.com/gres.html#GPU_Management</a>, 
 I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated.
<div><br>
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div>
<div dir="ltr">On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">It says in the script it has allocated 2.4GB</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Best,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Marcos<br>
</div>
<div id="x_x_x_m_1321177721242751015m_3108646833317763144x_m_3869060330462788085appendonsend">
</div>
<hr style="display:inline-block; width:882.21875px">
<div id="x_x_x_m_1321177721242751015m_3108646833317763144x_m_3869060330462788085divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" style="font-size:11pt"><b>From:</b><span class="x_x_Apple-converted-space"> </span>Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b><span class="x_x_Apple-converted-space"> </span>Monday, August 21, 2023 3:29 PM<br>
<b>To:</b><span class="x_x_Apple-converted-space"> </span>Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b><span class="x_x_Apple-converted-space"> </span>PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Guan, Collin X. (Fed) <<a href="mailto:collin.guan@nist.gov" target="_blank">collin.guan@nist.gov</a>><br>
<b>Subject:</b><span class="x_x_Apple-converted-space"> </span>Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi, Macros,
<div>  If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node.</div>
<div>  The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation).   So your job script and output are all good.<br>
<div><br>
</div>
</div>
<div>  Thanks.</div>
</div>
<br>
<div>
<div dir="ltr">On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-style:solid; border-left-color:rgb(204,204,204); padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking
 what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><span style="font-family:"Courier New",monospace">Mon Aug 21 14:36:07 2023      <span class="x_x_Apple-converted-space"> </span></span>
<div><span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |</span></div>
<div><span style="font-family:"Courier New",monospace">|-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |</span></div>
<div><span style="font-family:"Courier New",monospace">| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |               MIG M. |</span></div>
<div><span style="font-family:"Courier New",monospace">|=========================================+======================+======================|</span></div>
<div><span style="font-family:"Courier New",monospace">|   0  Tesla V100-SXM2-16GB           On  | 00000004:04:00.0 Off |                    0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A   34C    P0              63W / 300W |   2488MiB / 16384MiB |      0%      Default |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |                  N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">|   1  Tesla V100-SXM2-16GB           On  | 00000004:05:00.0 Off |                    0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A   38C    P0              56W / 300W |    638MiB / 16384MiB |      0%      Default |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |                  N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">|   2  Tesla V100-SXM2-16GB           On  | 00000035:03:00.0 Off |                    0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A   35C    P0              52W / 300W |    638MiB / 16384MiB |      0%      Default |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |                  N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">|   3  Tesla V100-SXM2-16GB           On  | 00000035:04:00.0 Off |                    0 |</span></div>
<div><span style="font-family:"Courier New",monospace">| N/A   38C    P0              53W / 300W |    638MiB / 16384MiB |      0%      Default |</span></div>
<div><span style="font-family:"Courier New",monospace">|                                         |                      |                  N/A |</span></div>
<div><span style="font-family:"Courier New",monospace">+-----------------------------------------+----------------------+----------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">                                                                                         </span></div>
<div><span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span></div>
<div><span style="font-family:"Courier New",monospace">| Processes:                                                                            |</span></div>
<div><span style="font-family:"Courier New",monospace">|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |</span></div>
<div><span style="font-family:"Courier New",monospace">|        ID   ID                                                             Usage      |</span></div>
<div><span style="font-family:"Courier New",monospace">|=======================================================================================|</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214626      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214630      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    0   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      308MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    1   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    1   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    2   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    2   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    3   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<div><span style="font-family:"Courier New",monospace">|    3   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux      318MiB |</span></div>
<span style="font-family:"Courier New",monospace">+---------------------------------------------------------------------------------------+</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected
 or there are some changes I need to do on my submission script/runtime parameters.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node):</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">
<div><span style="font-family:"Courier New",monospace">#!/bin/bash</span></div>
<div><span style="font-family:"Courier New",monospace"># ../../Utilities/Scripts/qfds.sh -p 2  -T db -d test.fds</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -J test<span class="x_x_Apple-converted-space"> </span></span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --partition=gpu</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks=16</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks-per-node=8</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --cpus-per-task=1</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --nodes=2</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --time=01:00:00</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --gres=gpu:4</span></div>
<br>
<div><span style="font-family:"Courier New",monospace">export OMP_NUM_THREADS=1</span></div>
<div><span style="font-family:"Courier New",monospace"># modules</span></div>
<div><span style="font-family:"Courier New",monospace">module load cuda/11.7</span></div>
<div><span style="font-family:"Courier New",monospace">module load gcc/11.2.1/toolset</span></div>
<div><span style="font-family:"Courier New",monospace">module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">cd /home/mnv/Firemodels_fork/fds/Issues/PETSc</span></div>
<div><br>
</div>
<div></div>
<span style="font-family:"Courier New",monospace">srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda</span>
<div></div>
<span style="font-family:"Courier New",monospace">                                   </span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Thank you for the advice,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">Marcos<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"> <br>
</div>
<div id="x_x_x_m_1321177721242751015m_3108646833317763144x_m_3869060330462788085x_m_-2525567993800845248appendonsend">
</div>
<br>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
<span class="x_x_x_gmail_signature_prefix">--<span class="x_x_Apple-converted-space"> </span></span><br>
<div dir="ltr" class="x_x_x_gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/" originalsrc="http://www.cse.buffalo.edu/~knepley/" shash="LbPdqUvXekmNRwdUYCbsKTY4rkF2JwSJ6FrXe/B6e6R6j3HDmh0X3v7CxqmCoUaOgLZSzkFmIPxzodwLfq5c6iKiR+dpQVXHCvKVJ98WmesZ8tHW/3Nrvdnk6ecojsqk0AnYSPHbxrggX6tIvF62gZM3oBAe2vw00zYzRSRwg4k=" originalsrc="http://www.cse.buffalo.edu/~knepley/" shash="NHB9PooWsdCdWM4zFeDQwmH1aBBtOsWDBKs6H2NzDWE1l/jXRA08JkMGMwcCWykV3t2sUMNGLCNETZ2HHQdAfYHkqwZVxUoyXsV+fyBW3v1aJ/HUycp7786udQxTjxuIuRJW1w7B/+D4LTbGNh6hlinDhoTNlREJvpITxjty62g=" target="_blank">https://www.cse.buffalo.edu/~knepley/</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</body>
</html>