<div dir="ltr"><div dir="ltr"><br clear="all"><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><br></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 15, 2023 at 9:57 AM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov">marcos.vanella@nist.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg6966861799056424862">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I see. I'm trying to get hypre (or any preconditioner) to run on the GPU and this is what is giving me issues. I can run cases with the CPU only version of PETSc without problems.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I tried running the job both in an interactive session and through slurm with the --with-cuda configured PETSc and passing the cuda vector flag at runtime like you did:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<font face="monospace">$ mpirun -n 2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -log_view -mat_type aijcusparse -vec_type cuda</font><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
and still get the error. So provided we both configured PETSc in the same way I'm thinking there is something going on with the configuration of my cluster.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Even without defining the <font face="monospace">
"-mat_type aijcusparse -vec_type cuda</font>" flags in the submission line I get the same "<span style="font-family:monospace;font-size:12pt;color:rgb(0,0,0)">parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument</span>"
error instead of what you see ("<font face="monospace">You need to enable PETSc device support</font>").<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I noted you use a Kokkos version of PETSc, is this related to your development?</div></div></div></blockquote><div>No, Kokkos is irrelevant here. Were you able to compile your code with the much simpler LFLAGS_PETSC?</div><div><span style="font-family:monospace">LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${</span><span style="font-family:monospace">PETSC_ARCH}/lib -L${PETSC_DIR}/${PETSC_ARCH}/</span><span style="font-family:monospace">lib -lpetsc</span><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg6966861799056424862"><div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thank you,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Marcos<br>
</div>
<div id="m_9111913385059578167appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_9111913385059578167divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Tuesday, August 15, 2023 9:59 AM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b> <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>>; McDermott, Randall J. (Fed) <<a href="mailto:randall.mcdermott@nist.gov" target="_blank">randall.mcdermott@nist.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div dir="ltr"><br>
<br>
</div>
<br>
<div>
<div dir="ltr">On Tue, Aug 15, 2023 at 8:55 AM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao, thank you for your observations and taking the time to look at this. So if I don't configure PETSc with the --with-cuda flag and still select HYPRE as the preconditioner, I still get Hypre to run on the GPU? I thought I needed that flag to get the
solvers to run on the V100 card.<br>
</div>
</div>
</div>
</blockquote>
No, to have hypre run on CPU, you need to configure petsc/hypre without --with-cuda; otherwise, you need --with-cuda and have to always use flags like -vec_type cuda etc. I admit this is not user-friendly and should be fixed by petsc and hypre developers.
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I'll remove the hardwired paths on the link flags, thanks for that!</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Marcos <br>
</div>
<div id="m_9111913385059578167x_m_4840539619422332629appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_9111913385059578167x_m_4840539619422332629divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Monday, August 14, 2023 7:01 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b> <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>>;
McDermott, Randall J. (Fed) <<a href="mailto:randall.mcdermott@nist.gov" target="_blank">randall.mcdermott@nist.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>Marcos,</div>
<div> These are my findings. I successfully ran the test in the end.</div>
<div><br>
</div>
<div><font face="monospace">$ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view<br>
</font></div>
<div><font face="monospace"> Starting FDS ...<br>
</font></div>
<div><font face="monospace">...</font></div>
<div><font face="monospace">[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>
[0]PETSC ERROR: Invalid argument<br>
[0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to enable PETSc device support, for example, in some cases, -vec_type cuda</font><br>
</div>
<div><br>
</div>
Now I get why you met errors with "CPU runs". You configured and built hypre with petsc. Since you added --with-cuda, petsc would configure hypre with its GPU support. However, hypre has a limit/shortcoming that if it is configured with GPU support, you must
pass GPU vectors to it. Thus the error. In other words, if you remove --with-cuda, you should be able to run above command.
<div><br>
</div>
<div><br>
</div>
<div><font face="monospace">$ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type aijcusparse -vec_type cuda<br>
<br>
Starting FDS ...<br>
<br>
MPI Process 0 started on hong-gce-workstation<br>
MPI Process 1 started on hong-gce-workstation<br>
<br>
Reading FDS input file ...<br>
<br>
At line 3014 of file ../../Source/read.f90<br>
Fortran runtime warning: An array temporary was created<br>
At line 3461 of file ../../Source/read.f90<br>
Fortran runtime warning: An array temporary was created<br>
WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen.<br>
At line 3014 of file ../../Source/read.f90<br>
..<br>
<br>
Fire Dynamics Simulator</font><br>
<br>
<font face="monospace">...</font></div>
<div><font face="monospace">STOP: FDS completed successfully (CHID: test)</font><br>
</div>
<div><br>
</div>
<div>I guess there were link problems in your makefile. Actually, in the first try, I failed with </div>
<div><br>
</div>
<div><font face="monospace">mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\"
-DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\"" -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC -I"/home/jczhang/petsc/include/" -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o
fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o init.o dump.o read.o divg.o main.o -Wl,-rpath
-Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack
-lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lflapack
-lfblas -lstdc++ -L/usr/lib64 -lX11 <br>
/usr/bin/ld: cannot find -lflapack: No such file or directory<br>
/usr/bin/ld: cannot find -lfblas: No such file or directory<br>
collect2: error: ld returned 1 exit status<br>
make: *** [../makefile:357: ompi_gnu_linux_db] Error 1</font><br>
</div>
<div><br>
</div>
<div>That is because you hardwired many link flags in your fds/Build/makefile. Then I changed LFLAGS_PETSC to</div>
<div><font face="monospace"> LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc</font><br>
</div>
<div><br>
</div>
<div>and everything worked. Could you also try it?</div>
<div><br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
<br>
<div>
<div dir="ltr">On Mon, Aug 14, 2023 at 4:53 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Attached is the test.fds test case. Thanks!<br>
</div>
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Sent:</b> Monday, August 14, 2023 5:45 PM<br>
<b>To:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>>;
<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>><br>
<b>Cc:</b> McDermott, Randall J. (Fed) <<a href="mailto:randall.mcdermott@nist.gov" target="_blank">randall.mcdermott@nist.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
All right Junchao, thank you for looking at this!</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
So, I checked out the <span style="font-family:"Courier New",monospace">/dir_to_petsc/petsc/main</span> branch, setup the petsc env variables:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"># PETSc dir and arch, set MYSYS to nisaba dor FDS:</span>
<div><span style="font-family:"Courier New",monospace">export PETSC_DIR=/dir_to_petsc/petsc</span></div>
<div><span style="font-family:"Courier New",monospace">export PETSC_ARCH=arch-linux-c-dbg</span></div>
<div><span style="font-family:"Courier New",monospace">export MYSYSTEM=nisaba</span></div>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
and configured the library with:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">$ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack
--with-cuda</span><br>
</div>
<br>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Then made and checked the PETSc build.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Then for FDS:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<ol>
<li style="list-style-type:"1. ""><span>Clone my fds repo in a ~/fds_dir you make, and checkout the FireX branch:<br>
</span></li></ol>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">$ cd ~/fds_dir</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">$ git clone </span><a href="https://github.com/marcosvanella/fds.git" id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176LPNoLPOWALinkPreview" target="_blank"><span style="font-family:"Courier New",monospace">https://github.com/marcosvanella/fds.git</span></a><br>
</div>
<div></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">$ cd fds</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">$ git checkout FireX</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<ol start="2">
<li style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;list-style-type:"2. ";color:rgb(0,0,0)">
With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a debug target for fds (this is with cuda enabled openmpi compiled with gcc, in my case gcc-11.2 + PETSc):</li></ol>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">$ cd Build/ompi_gnu_linux_db</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">$./make_fds.sh</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
You should see compilation lines like this, with the WITH_PETSC Preprocessor variable being defined:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">Building ompi_gnu_linux_db</span>
<div><span style="font-family:"Courier New",monospace">mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics
-fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\"
<b>-DWITH_PETSC</b> -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90</span></div>
<div><span style="font-family:"Courier New",monospace">mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics
-fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\"
<b>-DWITH_PETSC</b> -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90</span></div>
<div><span style="font-family:"Courier New",monospace">mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics
-fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\"
<b>-DWITH_PETSC</b> -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90</span></div>
<span style="font-family:"Courier New",monospace">mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics
-fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\"
<b>-DWITH_PETSC</b> -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">...</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">...</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
If you are compiling on a Power9 node you might come across this error right off the bat:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
.<span style="font-family:"Courier New",monospace">./../Source/prec.f90:34:8:</span>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A very small number 16 byte accuracy</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<span style="font-family:"Courier New",monospace">Error: Kind -3 not supported for type REAL at (1)</span><br style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
which means for some reason gcc in the Power9 does not like quad precision definition in this manner. A way around it is to add the intrinsic Fortran2008 module iso_fortran_env:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">use, intrinsic :: iso_fortran_env</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
in the fds/Source/prec.f90 file and change the quad precision denominator to:<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">INTEGER, PARAMETER :: QB = REAL128 </span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
in there. We are investigating the reason why this is happening. This is not related to Petsc in the code, everything related to PETSc calls is integers and double precision reals.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
After the code compiles you get the executable in <span style="font-family:"Courier New",monospace">
~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Helvetica,sans-serif">With which you can run the attached 2 mesh case as:</span><span style="font-family:"Courier New",monospace"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">$ mpirun -n 2 </span><span style="font-family:"Courier New",monospace">~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db</span><span style="font-family:"Courier New",monospace"> test.fds -log_view</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
and change PETSc <span style="font-family:"Courier New",monospace">ksp, pc</span> runtime flags, etc. The default is PCG + HYPRE which is what I was testing in CPU. This is the result I get from the previous submission in an interactive job in Enki (similar
with batch submissions, <span style="font-family:"Courier New",monospace">gmres ksp</span>,
<span style="font-family:"Courier New",monospace">gamg pc</span>):</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">Starting FDS ...</span>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> MPI Process 1 started on enki11.adlp</span></div>
<div><span style="font-family:"Courier New",monospace"> MPI Process 0 started on enki11.adlp</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> Reading FDS input file ...</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen.</span></div>
<div><span style="font-family:"Courier New",monospace">At line 3014 of file ../../Source/read.f90</span></div>
<div><span style="font-family:"Courier New",monospace">Fortran runtime warning: An array temporary was created</span></div>
<div><span style="font-family:"Courier New",monospace">At line 3014 of file ../../Source/read.f90</span></div>
<div><span style="font-family:"Courier New",monospace">Fortran runtime warning: An array temporary was created</span></div>
<div><span style="font-family:"Courier New",monospace">At line 3461 of file ../../Source/read.f90</span></div>
<div><span style="font-family:"Courier New",monospace">Fortran runtime warning: An array temporary was created</span></div>
<div><span style="font-family:"Courier New",monospace">At line 3461 of file ../../Source/read.f90</span></div>
<div><span style="font-family:"Courier New",monospace">Fortran runtime warning: An array temporary was created</span></div>
<div><span style="font-family:"Courier New",monospace">WARNING: DEVC Device is not within any mesh.</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> Fire Dynamics Simulator</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> Current Date : August 14, 2023 17:26:22</span></div>
<div><span style="font-family:"Courier New",monospace"> Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX</span></div>
<div><span style="font-family:"Courier New",monospace"> Revision Date : Mon Aug 14 17:07:20 2023 -0400</span></div>
<div><span style="font-family:"Courier New",monospace"> Compiler : Gnu gfortran 11.2.1</span></div>
<div><span style="font-family:"Courier New",monospace"> Compilation Date : Aug 14, 2023 17:11:05</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> MPI Enabled; Number of MPI Processes: 2</span></div>
<div><span style="font-family:"Courier New",monospace"> OpenMP Enabled; Number of OpenMP Threads: 1</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> MPI version: 3.1</span></div>
<div><span style="font-family:"Courier New",monospace"> MPI library version: Open MPI v4.1.4, package: Open MPI xng4@enki01.adlp Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> Job TITLE : </span>
</div>
<div><span style="font-family:"Courier New",monospace"> Job ID string : test</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">terminate called after throwing an instance of 'thrust::system::system_error'</span></div>
<div><span style="font-family:"Courier New",monospace">terminate called after throwing an instance of 'thrust::system::system_error'</span></div>
<div><span style="font-family:"Courier New",monospace"> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument</span></div>
<div><span style="font-family:"Courier New",monospace"> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Program received signal SIGABRT: Process abort signal.</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Backtrace for this error:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Program received signal SIGABRT: Process abort signal.</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Backtrace for this error:</span></div>
<div><span style="font-family:"Courier New",monospace">#0 0x2000397fcd8f in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#1 0x2000397fb657 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#2 0x2000000604d7 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#3 0x200039cb9628 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#0 0x2000397fcd8f in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#1 0x2000397fb657 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#2 0x2000000604d7 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#3 0x200039cb9628 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#4 0x200039c93eb3 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#5 0x200039364a97 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#4 0x200039c93eb3 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#5 0x200039364a97 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#6 0x20003935f6d3 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#7 0x20003935f78f in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#8 0x20003935fc6b in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#6 0x20003935f6d3 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#7 0x20003935f78f in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#8 0x20003935fc6b in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225</span></div>
<div><span style="font-family:"Courier New",monospace">#10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88</span></div>
<div><span style="font-family:"Courier New",monospace">#11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55</span></div>
<div><span style="font-family:"Courier New",monospace">#9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225</span></div>
<div><span style="font-family:"Courier New",monospace">#10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88</span></div>
<div><span style="font-family:"Courier New",monospace">#11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55</span></div>
<div><span style="font-family:"Courier New",monospace">#12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93</span></div>
<div><span style="font-family:"Courier New",monospace">#12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93</span></div>
<div><span style="font-family:"Courier New",monospace">#13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104</span></div>
<div><span style="font-family:"Courier New",monospace">#14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254</span></div>
<div><span style="font-family:"Courier New",monospace">#15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220</span></div>
<div><span style="font-family:"Courier New",monospace">#16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213</span></div>
<div><span style="font-family:"Courier New",monospace">#17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65</span></div>
<div><span style="font-family:"Courier New",monospace">#13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104</span></div>
<div><span style="font-family:"Courier New",monospace">#14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254</span></div>
<div><span style="font-family:"Courier New",monospace">#15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220</span></div>
<div><span style="font-family:"Courier New",monospace">#16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213</span></div>
<div><span style="font-family:"Courier New",monospace">#17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65</span></div>
<div><span style="font-family:"Courier New",monospace">#18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88</span></div>
<div><span style="font-family:"Courier New",monospace"><b>#19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU</b></span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:2488/" target="_blank">aijcusparse.cu:2488</a></span></div>
<div><span style="font-family:"Courier New",monospace"><b>#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE</b></span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:4300/" target="_blank">aijcusparse.cu:4300</a></span></div>
<div><span style="font-family:"Courier New",monospace">#18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88</span></div>
<div><span style="font-family:"Courier New",monospace">#<b>19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU</b></span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:2488/" target="_blank">aijcusparse.cu:2488</a></span></div>
<div><span style="font-family:"Courier New",monospace"><b>#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE</b></span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:4300/" target="_blank">aijcusparse.cu:4300</a></span></div>
<div><span style="font-family:"Courier New",monospace">#21 0x11e91bc7 in MatSetPreallocationCOO</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650</span></div>
<div><span style="font-family:"Courier New",monospace">#21 0x11e91bc7 in MatSetPreallocationCOO</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650</span></div>
<div><span style="font-family:"Courier New",monospace">#22 0x1316d5ab in MatConvert_AIJ_HYPRE</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648</span></div>
<div><span style="font-family:"Courier New",monospace">#22 0x1316d5ab in MatConvert_AIJ_HYPRE</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648</span></div>
<div><span style="font-family:"Courier New",monospace">#23 0x11e3b463 in MatConvert</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428</span></div>
<div><span style="font-family:"Courier New",monospace">#23 0x11e3b463 in MatConvert</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428</span></div>
<div><span style="font-family:"Courier New",monospace">#24 0x14072213 in PCSetUp_HYPRE</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254</span></div>
<div><span style="font-family:"Courier New",monospace">#24 0x14072213 in PCSetUp_HYPRE</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254</span></div>
<div><span style="font-family:"Courier New",monospace">#25 0x1276a9db in PCSetUp</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069</span></div>
<div><span style="font-family:"Courier New",monospace">#25 0x1276a9db in PCSetUp</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069</span></div>
<div><span style="font-family:"Courier New",monospace">#26 0x127d923b in KSPSetUp</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415</span></div>
<div><span style="font-family:"Courier New",monospace">#27 0x127e033f in KSPSolve_Private</span></div>
<div><span style="font-family:"Courier New",monospace">#26 0x127d923b in KSPSetUp</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415</span></div>
<div><span style="font-family:"Courier New",monospace">#27 0x127e033f in KSPSolve_Private</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836</span></div>
<div><span style="font-family:"Courier New",monospace">#28 0x127e6f07 in KSPSolve</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082</span></div>
<div><span style="font-family:"Courier New",monospace">#28 0x127e6f07 in KSPSolve</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082</span></div>
<div><span style="font-family:"Courier New",monospace">#29 0x1280d70b in kspsolve_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335</span></div>
<div><span style="font-family:"Courier New",monospace">#29 0x1280d70b in kspsolve_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335</span></div>
<div><span style="font-family:"Courier New",monospace">#30 0x1140858f in __globmat_solver_MOD_glmat_solver</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/pres.f90:3130</span></div>
<div><span style="font-family:"Courier New",monospace">#30 0x1140858f in __globmat_solver_MOD_glmat_solver</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/pres.f90:3130</span></div>
<div><span style="font-family:"Courier New",monospace">#31 0x119faddf in pressure_iteration_scheme</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:1449</span></div>
<div><span style="font-family:"Courier New",monospace">#32 0x1196c15f in fds</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:688</span></div>
<div><span style="font-family:"Courier New",monospace">#31 0x119faddf in pressure_iteration_scheme</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:1449</span></div>
<div><span style="font-family:"Courier New",monospace">#32 0x1196c15f in fds</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:688</span></div>
<div><span style="font-family:"Courier New",monospace">#33 0x11a126f3 in main</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:6</span></div>
<div><span style="font-family:"Courier New",monospace">#33 0x11a126f3 in main</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:6</span></div>
<div><span style="font-family:"Courier New",monospace">--------------------------------------------------------------------------</span></div>
<div><span style="font-family:"Courier New",monospace">Primary job terminated normally, but 1 process returned</span></div>
<div><span style="font-family:"Courier New",monospace">a non-zero exit code. Per user-direction, the job has been aborted.</span></div>
<div><span style="font-family:"Courier New",monospace">--------------------------------------------------------------------------</span></div>
<div><span style="font-family:"Courier New",monospace">--------------------------------------------------------------------------</span></div>
<div><span style="font-family:"Courier New",monospace">mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited on signal 6 (Aborted).</span></div>
<div><span style="font-family:"Courier New",monospace">--------------------------------------------------------------------------</span></div>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Seems the issue stems from the call to KSPSOLVE, line 3130 in fds/Source/pres.f90. </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Well, thank you for taking the time to look at this and also let me know if these threads should be moved to the issue tracker, or other venue.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Best,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Marcos<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Monday, August 14, 2023 4:37 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>>; PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>I don't see a problem in the matrix assembly. </div>
<div>If you point me to your repo and show me how to build it, I can try to reproduce. <br>
</div>
<div><br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
<br>
<div>
<div dir="ltr">On Mon, Aug 14, 2023 at 2:53 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as (I understand) is done in the ex60. The error is always the same, so it seems it is not related to ksp,pc. Indeed
it seems to happen when trying to offload the Matrix to the GPU:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
terminate called after throwing an instance of 'thrust::system::system_error'
<div>terminate called after throwing an instance of 'thrust::system::system_error'</div>
<div> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument</div>
<div> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument</div>
<div><br>
</div>
<div>Program received signal SIGABRT: Process abort signal.</div>
<div><br>
</div>
<div>Backtrace for this error:</div>
<div><br>
</div>
<div>Program received signal SIGABRT: Process abort signal.</div>
<div><br>
</div>
<div>Backtrace for this error:</div>
<div>#0 0x2000397fcd8f in ???</div>
...</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
#8 0x20003935fc6b in ???
<div>#9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc</div>
<div> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225</div>
<div>#10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_</div>
<div> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88</div>
<div>#11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_</div>
<div>#9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc</div>
<div> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225</div>
<div>#10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_</div>
<div> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88</div>
<div>#11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55</div>
<div>#12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55</div>
<div>#12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93</div>
<div>#13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104</div>
<div>#14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254</div>
<div>#15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</div>
<div>#13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104</div>
<div>#14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254</div>
<div>#15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220</div>
<div>#16 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213</div>
<div>#17 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em</div>
<div> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65</div>
<div>#18 0x11edb287 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em</div>
<div> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88</div>
<div>#19 0x11edb287 in <b>MatSeqAIJCUSPARSECopyToGPU</b></div>
<div> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:2488/" target="_blank">aijcusparse.cu:2488</a></div>
<div>#20 0x11edfd1b in <b>MatSeqAIJCUSPARSEGetIJ</b></div>
...</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
...</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
This is the piece of fortran code I have doing this within my Poisson solver:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div style="font-family:Menlo,Monaco,"Courier New",monospace;font-weight:normal;font-size:14px;line-height:21px;color:rgb(204,204,204);background-color:rgb(31,31,31)">
<span><span style="color:rgb(106,153,85)">! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag blocks nonzeros per row to 5.</span></span>
<div><span style="color:rgb(197,134,192)">CALL</span><span> </span><span style="color:rgb(220,220,170)">MATCREATEAIJ</span><span>(MPI_COMM_WORLD,ZSL</span><span style="color:rgb(86,156,214)">%</span><span>NUNKH_LOCAL,ZSL</span><span style="color:rgb(86,156,214)">%</span><span>NUNKH_LOCAL,ZSL</span><span style="color:rgb(86,156,214)">%</span><span>NUNKH_TOTAL,ZSL</span><span style="color:rgb(86,156,214)">%</span><span>NUNKH_TOTAL,</span><span style="color:rgb(212,212,212)">&</span></div>
<div><span> </span><span style="color:rgb(181,206,168)">7</span><span>,PETSC_NULL_INTEGER,</span><span style="color:rgb(181,206,168)">7</span><span>,PETSC_NULL_INTEGER,ZSL</span><span style="color:rgb(86,156,214)">%</span><span>PETSC_ZS</span><span style="color:rgb(86,156,214)">%</span><span>A_H,PETSC_IERR)</span></div>
<span style="color:rgb(197,134,192)">CALL</span><span> </span><span style="color:rgb(220,220,170)">MATSETFROMOPTIONS</span><span>(ZSL</span><span style="color:rgb(86,156,214)">%</span><span>PETSC_ZS</span><span style="color:rgb(86,156,214)">%</span><span>A_H,PETSC_IERR)</span>
<div><span style="color:rgb(197,134,192)">DO</span><span> IROW</span><span style="color:rgb(212,212,212)">=</span><span style="color:rgb(181,206,168)">1</span><span>,ZSL</span><span style="color:rgb(86,156,214)">%</span><span>NUNKH_LOCAL</span></div>
<div><span> </span><span style="color:rgb(197,134,192)">DO</span><span> JCOL</span><span style="color:rgb(212,212,212)">=</span><span style="color:rgb(181,206,168)">1</span><span>,ZSL</span><span style="color:rgb(86,156,214)">%</span><span>NNZ_D_MAT_H(IROW)</span></div>
<div><span> </span><span style="color:rgb(106,153,85)">! PETSC expects zero based indexes.1,Global I position
<span>(zero base),1,</span>Global J position (zero base)<br>
</span></div>
<div><span> </span><span style="color:rgb(197,134,192)">CALL</span><span> </span><span style="color:rgb(220,220,170)">MATSETVALUES</span><span>(ZSL</span><span style="color:rgb(86,156,214)">%</span><span>PETSC_ZS</span><span style="color:rgb(86,156,214)">%</span><span>A_H,</span><span style="color:rgb(181,206,168)">1</span><span>,ZSL</span><span style="color:rgb(86,156,214)">%</span><span>UNKH_IND(NM_START)</span><span style="color:rgb(212,212,212)">+</span><span>IROW</span><span style="color:rgb(181,206,168)">-1</span><span>,</span><span style="color:rgb(181,206,168)">1</span><span>,ZSL</span><span style="color:rgb(86,156,214)">%</span><span>JD_MAT_H(JCOL,IROW)</span><span style="color:rgb(181,206,168)">-1</span><span>,</span><span style="color:rgb(212,212,212)">&</span></div>
<div><span> ZSL</span><span style="color:rgb(86,156,214)">%</span><span>D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR)</span></div>
<div><span> </span><span style="color:rgb(197,134,192)">ENDDO</span></div>
<div><span style="color:rgb(197,134,192)">ENDDO</span></div>
<div><span style="color:rgb(197,134,192)">CALL</span><span> </span><span style="color:rgb(220,220,170)">MATASSEMBLYBEGIN</span><span>(ZSL</span><span style="color:rgb(86,156,214)">%</span><span>PETSC_ZS</span><span style="color:rgb(86,156,214)">%</span><span>A_H,
MAT_FINAL_ASSEMBLY, PETSC_IERR)</span></div>
<span><span style="color:rgb(197,134,192)">CALL</span><span> </span><span style="color:rgb(220,220,170)">MATASSEMBLYEND</span><span>(ZSL</span><span style="color:rgb(86,156,214)">%</span><span>PETSC_ZS</span><span style="color:rgb(86,156,214)">%</span><span>A_H,
MAT_FINAL_ASSEMBLY, PETSC_IERR)</span></span></div>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), and add nonzero values one by one. I wonder if there is something related to this that the copying to GPU does not like.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Marcos <br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043appendonsend">
</div>
<hr style="display:inline-block;width:98%">
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Monday, August 14, 2023 3:24 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b> PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>Yeah, it looks like ex60 was run correctly.</div>
<div>Double check your code again and if you still run into errors, we can try to reproduce on our end.</div>
<div><br>
</div>
Thanks.<br clear="all">
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
<br>
<div>
<div dir="ltr">On Mon, Aug 14, 2023 at 1:05 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The batch script for slurm submission, ex60.log and gpu stats files are attached.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Nothing stands out as wrong to me but please have a look. </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks!</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Marcos<br>
</div>
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043x_m_-203324221208141664m_4555633652834596028appendonsend">
</div>
<hr style="display:inline-block;width:98%">
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043x_m_-203324221208141664m_4555633652834596028divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Friday, August 11, 2023 5:52 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b> PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">Before digging into the details, could you try to run src/ksp/ksp/tests/ex60.c to make sure the environment is ok.
<div><br>
</div>
<div>The comment at the end shows how to run it</div>
<div> test:<br>
requires: cuda<br>
suffix: 1_cuda<br>
nsize: 4<br>
args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse<br>
</div>
<div>
<div><br>
</div>
<div>
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
</div>
</div>
<br>
<div>
<div dir="ltr">On Fri, Aug 11, 2023 at 4:36 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao, thank you for the info. I compiled the main branch of PETSc in another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain and don't see the fortran compilation error. It might have been related to gcc-9.3.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I tried the case again, 2 CPUs and one GPU and get this error now:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">terminate called after throwing an instance of 'thrust::system::system_error'</span>
<div><span style="font-family:"Courier New",monospace">terminate called after throwing an instance of 'thrust::system::system_error'</span></div>
<div><span style="font-family:"Courier New",monospace"> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument</span></div>
<div><span style="font-family:"Courier New",monospace"> what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Program received signal SIGABRT: Process abort signal.</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Backtrace for this error:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Program received signal SIGABRT: Process abort signal.</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Backtrace for this error:</span></div>
<div><span style="font-family:"Courier New",monospace">#0 0x2000397fcd8f in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#1 0x2000397fb657 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#0 0x2000397fcd8f in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#1 0x2000397fb657 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#2 0x2000000604d7 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#2 0x2000000604d7 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#3 0x200039cb9628 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#4 0x200039c93eb3 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#5 0x200039364a97 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#6 0x20003935f6d3 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#7 0x20003935f78f in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#8 0x20003935fc6b in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#3 0x200039cb9628 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#4 0x200039c93eb3 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#5 0x200039364a97 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#6 0x20003935f6d3 in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#7 0x20003935f78f in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#8 0x20003935fc6b in ???</span></div>
<div><span style="font-family:"Courier New",monospace">#9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225</span></div>
<div><span style="font-family:"Courier New",monospace">#10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_</span></div>
<div><span style="font-family:"Courier New",monospace">#9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225</span></div>
<div><span style="font-family:"Courier New",monospace">#10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88</span></div>
<div><span style="font-family:"Courier New",monospace">#11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88</span></div>
<div><span style="font-family:"Courier New",monospace">#11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55</span></div>
<div><span style="font-family:"Courier New",monospace">#12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93</span></div>
<div><span style="font-family:"Courier New",monospace">#13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55</span></div>
<div><span style="font-family:"Courier New",monospace">#12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93</span></div>
<div><span style="font-family:"Courier New",monospace">#13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104</span></div>
<div><span style="font-family:"Courier New",monospace">#14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254</span></div>
<div><span style="font-family:"Courier New",monospace">#15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220</span></div>
<div><span style="font-family:"Courier New",monospace">#14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254</span></div>
<div><span style="font-family:"Courier New",monospace">#15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220</span></div>
<div><span style="font-family:"Courier New",monospace">#16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213</span></div>
<div><span style="font-family:"Courier New",monospace">#17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65</span></div>
<div><span style="font-family:"Courier New",monospace">#18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88</span></div>
<div><span style="font-family:"Courier New",monospace">#19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:2488/" target="_blank">aijcusparse.cu:2488</a></span></div>
<div><span style="font-family:"Courier New",monospace">#20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:4696/" target="_blank">aijcusparse.cu:4696</a></span></div>
<div><span style="font-family:"Courier New",monospace">#16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213</span></div>
<div><span style="font-family:"Courier New",monospace">#17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65</span></div>
<div><span style="font-family:"Courier New",monospace">#18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em</span></div>
<div><span style="font-family:"Courier New",monospace"> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88</span></div>
<div><span style="font-family:"Courier New",monospace">#19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:2488/" target="_blank">aijcusparse.cu:2488</a></span></div>
<div><span style="font-family:"Courier New",monospace">#20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:4696/" target="_blank">aijcusparse.cu:4696</a></span></div>
<div><span style="font-family:"Courier New",monospace">#21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/<a href="http://mpiaijcusparse.cu:251/" target="_blank">mpiaijcusparse.cu:251</a></span></div>
<div><span style="font-family:"Courier New",monospace">#21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/<a href="http://mpiaijcusparse.cu:251/" target="_blank">mpiaijcusparse.cu:251</a></span></div>
<div><span style="font-family:"Courier New",monospace">#22 0x133f141f in MatMPIAIJGetLocalMatMerge</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342</span></div>
<div><span style="font-family:"Courier New",monospace">#22 0x133f141f in MatMPIAIJGetLocalMatMerge</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342</span></div>
<div><span style="font-family:"Courier New",monospace">#23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368</span></div>
<div><span style="font-family:"Courier New",monospace">#23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368</span></div>
<div><span style="font-family:"Courier New",monospace">#24 0x1377e1df in MatProductSymbolic</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795</span></div>
<div><span style="font-family:"Courier New",monospace">#24 0x1377e1df in MatProductSymbolic</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795</span></div>
<div><span style="font-family:"Courier New",monospace">#25 0x11e4dd1f in MatPtAP</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934</span></div>
<div><span style="font-family:"Courier New",monospace">#25 0x11e4dd1f in MatPtAP</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934</span></div>
<div><span style="font-family:"Courier New",monospace">#26 0x130d792f in MatCoarsenApply_MISK_private</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283</span></div>
<div><span style="font-family:"Courier New",monospace">#26 0x130d792f in MatCoarsenApply_MISK_private</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283</span></div>
<div><span style="font-family:"Courier New",monospace">#27 0x130db89b in MatCoarsenApply_MISK</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368</span></div>
<div><span style="font-family:"Courier New",monospace">#27 0x130db89b in MatCoarsenApply_MISK</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368</span></div>
<div><span style="font-family:"Courier New",monospace">#28 0x130bf5a3 in MatCoarsenApply</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97</span></div>
<div><span style="font-family:"Courier New",monospace">#28 0x130bf5a3 in MatCoarsenApply</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97</span></div>
<div><span style="font-family:"Courier New",monospace">#29 0x141518ff in PCGAMGCoarsen_AGG</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524</span></div>
<div><span style="font-family:"Courier New",monospace">#29 0x141518ff in PCGAMGCoarsen_AGG</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524</span></div>
<div><span style="font-family:"Courier New",monospace">#30 0x13b3a43f in PCSetUp_GAMG</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631</span></div>
<div><span style="font-family:"Courier New",monospace">#30 0x13b3a43f in PCSetUp_GAMG</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631</span></div>
<div><span style="font-family:"Courier New",monospace">#31 0x1276845b in PCSetUp</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069</span></div>
<div><span style="font-family:"Courier New",monospace">#31 0x1276845b in PCSetUp</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069</span></div>
<div><span style="font-family:"Courier New",monospace">#32 0x127d6cbb in KSPSetUp</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415</span></div>
<div><span style="font-family:"Courier New",monospace">#32 0x127d6cbb in KSPSetUp</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415</span></div>
<div><span style="font-family:"Courier New",monospace">#33 0x127dddbf in KSPSolve_Private</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836</span></div>
<div><span style="font-family:"Courier New",monospace">#33 0x127dddbf in KSPSolve_Private</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836</span></div>
<div><span style="font-family:"Courier New",monospace">#34 0x127e4987 in KSPSolve</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082</span></div>
<div><span style="font-family:"Courier New",monospace">#34 0x127e4987 in KSPSolve</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082</span></div>
<div><span style="font-family:"Courier New",monospace">#35 0x1280b18b in kspsolve_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335</span></div>
<div><span style="font-family:"Courier New",monospace">#35 0x1280b18b in kspsolve_</span></div>
<div><span style="font-family:"Courier New",monospace"> at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335</span></div>
<div><span style="font-family:"Courier New",monospace">#36 0x1140945f in __globmat_solver_MOD_glmat_solver</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/pres.f90:3128</span></div>
<div><span style="font-family:"Courier New",monospace">#36 0x1140945f in __globmat_solver_MOD_glmat_solver</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/pres.f90:3128</span></div>
<div><span style="font-family:"Courier New",monospace">#37 0x119f8853 in pressure_iteration_scheme</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:1449</span></div>
<div><span style="font-family:"Courier New",monospace">#37 0x119f8853 in pressure_iteration_scheme</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:1449</span></div>
<div><span style="font-family:"Courier New",monospace">#38 0x11969bd3 in fds</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:688</span></div>
<div><span style="font-family:"Courier New",monospace">#38 0x11969bd3 in fds</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:688</span></div>
<div><span style="font-family:"Courier New",monospace">#39 0x11a10167 in main</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:6</span></div>
<div><span style="font-family:"Courier New",monospace">#39 0x11a10167 in main</span></div>
<div><span style="font-family:"Courier New",monospace"> at ../../Source/main.f90:6</span></div>
<span style="font-family:"Courier New",monospace">srun: error: enki12: tasks 0-1: Aborted (core dumped)</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
This was the slurm submission script in this case:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">#!/bin/bash</span>
<div><span style="font-family:"Courier New",monospace"># ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -J test </span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --partition=debug</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks=2</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --nodes=1</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --cpus-per-task=1</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --ntasks-per-node=2</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --time=01:00:00</span></div>
<div><span style="font-family:"Courier New",monospace">#SBATCH --gres=gpu:1</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">export OMP_NUM_THREADS=1</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"># PETSc dir and arch:</span></div>
<div><span style="font-family:"Courier New",monospace">export PETSC_DIR=/home/mnv/Software/petsc</span></div>
<div><span style="font-family:"Courier New",monospace">export PETSC_ARCH=arch-linux-c-dbg</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"># SYSTEM name:</span></div>
<div><span style="font-family:"Courier New",monospace">export MYSYSTEM=enki</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"># modules</span></div>
<div><span style="font-family:"Courier New",monospace">module load cuda/11.7</span></div>
<div><span style="font-family:"Courier New",monospace">module load gcc/11.2.1/toolset</span></div>
<div><span style="font-family:"Courier New",monospace">module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">cd /home/mnv/Firemodels_fork/fds/Issues/PETSc</span></div>
<span style="font-family:"Courier New",monospace">srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Helvetica,sans-serif">The configure.log for the PETSc build is attached.
</span><span style="font-family:Calibri,Helvetica,sans-serif">Another clue to what is happening is that even setting the matrices/vectors to be mpi (-vec_type mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning :</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
<div>[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------</div>
<div>[1]PETSC ERROR: GPU error</div>
<div>[1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected</div>
<div>[1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!</div>
<div>[0]PETSC ERROR: GPU error</div>
<div>[0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected</div>
<div>[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!</div>
<div>[0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line</div>
<div>[0]PETSC ERROR: See <a href="https://petsc.org/release/faq/" target="_blank">
https://petsc.org/release/faq/</a> for trouble shooting.</div>
<div>[1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line</div>
<div>[1]PETSC ERROR: See <a href="https://petsc.org/release/faq/" target="_blank">
https://petsc.org/release/faq/</a> for trouble shooting.</div>
<div>[1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000</div>
<div>[0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000</div>
<div>[0]PETSC ERROR: /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023</div>
<div>[0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda</div>
...</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Helvetica,sans-serif">I would have expected not to see GPU errors being printed out, given I did not request cuda matrix/vectors. The case run anyways, I assume it defaulted to the CPU solver.</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Helvetica,sans-serif">Let me know if you have any ideas as to what is happening. Thanks,</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Helvetica,sans-serif">Marcos<br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043x_m_-203324221208141664m_4555633652834596028x_m_-5900986552187666187appendonsend">
</div>
<hr style="display:inline-block;width:98%">
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043x_m_-203324221208141664m_4555633652834596028x_m_-5900986552187666187divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Friday, August 11, 2023 3:35 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>>; PETSc users list <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>Marcos,</div>
<div> We do not have good petsc/gpu documentation, but see <a href="https://petsc.org/main/faq/#doc-faq-gpuhowto" target="_blank">
https://petsc.org/main/faq/#doc-faq-gpuhowto</a>, and also search "<span style="color:rgb(0,128,0);font-family:Menlo,Monaco,"Courier New",monospace;font-size:14px;white-space:pre-wrap">requires: cuda"</span><span style="color:rgb(0,128,0);font-family:Menlo,Monaco,"Courier New",monospace;font-size:14px;white-space:pre-wrap">
</span>in petsc tests and you will find examples using GPU.</div>
<div> For the Fortran compile errors, attach your configure.log and Satish (Cc'ed) or others should know how to fix them.</div>
<div><br>
</div>
<div> Thanks.<br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
<br>
<div>
<div dir="ltr">On Fri, Aug 11, 2023 at 2:22 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao, thanks for the explanation. Is there some development documentation on the GPU work? I'm interested learning about it.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I checked out the main branch and configured petsc. when compiling with gcc/gfortran I come across this error:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">....</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"> CUDAC arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o</span>
<div><span style="font-family:"Courier New",monospace"> CUDAC.dep arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o</span></div>
<div><span style="font-family:"Courier New",monospace"> FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o</span></div>
<div><span style="font-family:"Courier New",monospace"> FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z)</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace"><b>Error: Symbol ‘pcasmcreatesubdomains2d’ at (1) already has an explicit interface</b></span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 38 | import tIS</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: IMPORT statement at (1) only permitted in an INTERFACE body</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 39 | PetscInt a ! PetscInt</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 40 | PetscInt b ! PetscInt</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 41 | PetscInt c ! PetscInt</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 42 | PetscInt d ! PetscInt</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 43 | PetscInt e ! PetscInt</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 44 | PetscInt f ! PetscInt</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 45 | PetscInt g ! PetscInt</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 46 | IS h ! IS</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 47 | IS i ! IS</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 48 | PetscErrorCode z</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Unexpected data declaration statement in INTERFACE block at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">/home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10:</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace"> 49 | end subroutine PCASMCreateSubdomains2D</span></div>
<div><span style="font-family:"Courier New",monospace"> | 1</span></div>
<div><span style="font-family:"Courier New",monospace">Error: Expecting END INTERFACE statement at (1)</span></div>
<div><span style="font-family:"Courier New",monospace">make[3]: *** [gmakefile:225: arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1</span></div>
<div><span style="font-family:"Courier New",monospace">make[3]: *** Waiting for unfinished jobs....</span></div>
<div><span style="font-family:"Courier New",monospace"> CC arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o</span></div>
<div><span style="font-family:"Courier New",monospace"> CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o</span></div>
<div><span style="font-family:"Courier New",monospace"> CUDAC arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o</span></div>
<div><span style="font-family:"Courier New",monospace"> CUDAC.dep arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o</span></div>
<div><span style="font-family:"Courier New",monospace">make[3]: Leaving directory '/home/mnv/Software/petsc'</span></div>
<div><span style="font-family:"Courier New",monospace">make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2</span></div>
<div><span style="font-family:"Courier New",monospace">make[2]: Leaving directory '/home/mnv/Software/petsc'</span></div>
<div><span style="font-family:"Courier New",monospace">**************************ERROR*************************************</span></div>
<div><span style="font-family:"Courier New",monospace"> Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log</span></div>
<div><span style="font-family:"Courier New",monospace"> Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to
<a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a></span></div>
<div><span style="font-family:"Courier New",monospace">********************************************************************</span></div>
<div><span style="font-family:"Courier New",monospace">make[1]: *** [makefile:45: all] Error 1</span></div>
<span style="font-family:"Courier New",monospace">make: *** [GNUmakefile:9: all] Error 2 </span><br>
</div>
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043x_m_-203324221208141664m_4555633652834596028x_m_-5900986552187666187x_m_-6428481159717170951appendonsend">
</div>
<hr style="display:inline-block;width:98%">
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043x_m_-203324221208141664m_4555633652834596028x_m_-5900986552187666187x_m_-6428481159717170951divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Friday, August 11, 2023 3:04 PM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b> <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi, Macros,<br>
I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. We recently refactored the COO code and got rid of that function. So could you try petsc/main?<br>
We map MPI processes to GPUs in a round-robin fashion. We query the number of visible CUDA devices (g), and assign the device (rank%g) to the MPI process (rank). In that sense, the work distribution is totally determined by your MPI work partition (i.e,
yourself). <br>
On clusters, this MPI process to GPU binding is usually done by the job scheduler like slurm. You need to check your cluster's users' guide to see how to bind MPI processes to GPUs. If the job scheduler has done that, the number of visible CUDA devices to
a process might just appear to be 1, making petsc's own mapping void.<br>
<br>
Thanks.<br>
--Junchao Zhang<br>
<div><br>
</div>
</div>
<br>
<div>
<div dir="ltr">On Fri, Aug 11, 2023 at 12:43 PM Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is what I get for the case:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
terminate called after throwing an instance of 'thrust::system::system_error'
<div> what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered</div>
<div><br>
</div>
<div>Program received signal SIGABRT: Process abort signal.</div>
<div><br>
</div>
<div>Backtrace for this error:</div>
<div>#0 0x15264731ead0 in ???</div>
<div>#1 0x15264731dc35 in ???</div>
<div>#2 0x15264711551f in ???</div>
<div>#3 0x152647169a7c in ???</div>
<div>#4 0x152647115475 in ???</div>
<div>#5 0x1526470fb7f2 in ???</div>
<div>#6 0x152647678bbd in ???</div>
<div>#7 0x15264768424b in ???</div>
<div>#8 0x1526476842b6 in ???</div>
<div>#9 0x152647684517 in ???</div>
<div>#10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc</div>
<div> at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224</div>
<div>#11 0x55bb46342ebb in _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_</div>
<div> at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316</div>
<div>#12 0x55bb46342ebb in _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_</div>
<div> at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544</div>
<div>#13 0x55bb46342ebb in _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_</div>
<div> at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669</div>
<div>#14 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_</div>
<div> at /usr/local/cuda/include/thrust/detail/sort.inl:115</div>
<div>#15 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_</div>
<div> at /usr/local/cuda/include/thrust/detail/sort.inl:305</div>
<div>#16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic</div>
<div> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/<a href="http://aijcusparse.cu:4452/" target="_blank">aijcusparse.cu:4452</a></div>
<div>#17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic</div>
<div> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/<a href="http://mpiaijcusparse.cu:173/" target="_blank">mpiaijcusparse.cu:173</a></div>
<div>#18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE</div>
<div> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/<a href="http://mpiaijcusparse.cu:222/" target="_blank">mpiaijcusparse.cu:222</a></div>
<div>#19 0x55bb468e01cf in MatSetPreallocationCOO</div>
<div> at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606</div>
<div>#20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND</div>
<div> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547</div>
<div>#21 0x55bb469015e5 in MatProductSymbolic</div>
<div> at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803</div>
<div>#22 0x55bb4694ade2 in MatPtAP</div>
<div> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897</div>
<div>#23 0x55bb4696d3ec in MatCoarsenApply_MISK_private</div>
<div> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283</div>
<div>#24 0x55bb4696eb67 in MatCoarsenApply_MISK</div>
<div> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368</div>
<div>#25 0x55bb4695bd91 in MatCoarsenApply</div>
<div> at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97</div>
<div>#26 0x55bb478294d8 in PCGAMGCoarsen_AGG</div>
<div> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524</div>
<div>#27 0x55bb471d1cb4 in PCSetUp_GAMG</div>
<div> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631</div>
<div>#28 0x55bb464022cf in PCSetUp</div>
<div> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994</div>
<div>#29 0x55bb4718b8a7 in KSPSetUp</div>
<div> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406</div>
<div>#30 0x55bb4718f22e in KSPSolve_Private</div>
<div> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824</div>
<div>#31 0x55bb47192c0c in KSPSolve</div>
<div> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070</div>
<div>#32 0x55bb463efd35 in kspsolve_</div>
<div> at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320</div>
<div>#33 0x55bb45e94b32 in ???</div>
<div>#34 0x55bb46048044 in ???</div>
<div>#35 0x55bb46052ea1 in ???</div>
<div>#36 0x55bb45ac5f8e in ???</div>
<div>#37 0x1526470fcd8f in ???</div>
<div>#38 0x1526470fce3f in ???</div>
<div>#39 0x55bb45aef55d in ???</div>
<div>#40 0xffffffffffffffff in ???</div>
<div>--------------------------------------------------------------------------</div>
<div>Primary job terminated normally, but 1 process returned</div>
<div>a non-zero exit code. Per user-direction, the job has been aborted.</div>
<div>--------------------------------------------------------------------------</div>
<div>--------------------------------------------------------------------------</div>
<div>mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted).</div>
<div>--------------------------------------------------------------------------</div>
<div><br>
</div>
<div>BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about
this?</div>
<div>Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look.</div>
<div><br>
</div>
<div>Best Regards,</div>
<div>Marcos<br>
</div>
<br>
</div>
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043x_m_-203324221208141664m_4555633652834596028x_m_-5900986552187666187x_m_-6428481159717170951x_m_1722876366411198553appendonsend">
</div>
<hr style="display:inline-block;width:98%">
<div id="m_9111913385059578167x_m_4840539619422332629x_m_-6621474889523427176x_x_m_1367430803819874043x_m_-203324221208141664m_4555633652834596028x_m_-5900986552187666187x_m_-6428481159717170951x_m_1722876366411198553divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Friday, August 11, 2023 10:52 AM<br>
<b>To:</b> Vanella, Marcos (Fed) <<a href="mailto:marcos.vanella@nist.gov" target="_blank">marcos.vanella@nist.gov</a>><br>
<b>Cc:</b> <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>Hi, Marcos,</div>
<div> Could you build petsc in debug mode and then copy and paste the whole error stack message?</div>
<div><br>
</div>
Thanks<br clear="all">
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
<br>
<div>
<div dir="ltr">On Thu, Aug 10, 2023 at 5:51 PM Vanella, Marcos (Fed) via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run
the job with GPU enabled I get the following error:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace">terminate called after throwing an instance of 'thrust::system::system_error'</span>
<div><span style="font-family:"Courier New",monospace"> <b>what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered</b></span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Program received signal SIGABRT: Process abort signal.</span></div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">Backtrace for this error:</span></div>
<div><span style="font-family:"Courier New",monospace">terminate called after throwing an instance of 'thrust::system::system_error'</span></div>
<div><span style="font-family:"Courier New",monospace"> what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered</span></div>
<div><br>
</div>
<span style="font-family:"Courier New",monospace">Program received signal SIGABRT: Process abort signal.</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it:</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">#!/bin/bash
<div>#SBATCH -J test</div>
<div>#SBATCH -e /home/Issues/PETSc/test.err</div>
<div>#SBATCH -o /home/Issues/PETSc/test.log</div>
<div>#SBATCH --partition=batch</div>
<div>#SBATCH --ntasks=2</div>
<div>#SBATCH --nodes=1</div>
<div>#SBATCH --cpus-per-task=1</div>
<div>#SBATCH --ntasks-per-node=2</div>
<div>#SBATCH --time=01:00:00</div>
<div>#SBATCH --gres=gpu:1</div>
<div><br>
</div>
<div>export OMP_NUM_THREADS=1</div>
<div>module load cuda/11.5</div>
<div>module load openmpi/4.1.1</div>
<div><br>
</div>
<div>cd /home/Issues/PETSc</div>
<div><b>mpirun -n 2 </b>/home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds
<b>-vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg</b></div>
<br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">If anyone has any suggestions on how o troubleshoot this please let me know.</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">Thanks!</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">Marcos<br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:"Courier New",monospace"><br>
</span></div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div></blockquote></div></div>