<div dir="ltr">Hi Anna,<br>  Since you said "The code works with pc-type hypre on a single GPU.", I was wondering if this is a CUDA devices to MPI ranks binding problem.<br>  You can search TACC documentation to find how its job scheduler binds GPUs to MPI ranks (usually via manipulating the CUDA_VISIBLE_DEVICES environment variable)<br><div><span style="color:rgb(0,0,0);font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:16px"><br></span></div><div><span style="color:rgb(0,0,0);font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:16px">  Please follow up if you could not solve it.</span></div><div><br></div><div>  Thanks.<br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 31, 2024 at 4:07 PM Yesypenko, Anna <<a href="mailto:anna@oden.utexas.edu">anna@oden.utexas.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg22909944316812276">




<div dir="ltr">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Dear Petsc devs,</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I'm encountering an error running hypre on a single node with multiple GPUs.</div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">The issue is in the setup phase. I'm trying to troubleshoot, but don't know where to
 start.</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">Are the system routines PetScCUDAInitialize and
</span><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:16px;color:rgb(0,0,0);background-color:rgb(255,255,255)">PetScCUDAInitialize</span><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">Check
 available in python?</span></div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
How do I verify that GPUs are assigned properly to each MPI process? In this case, I have 3 tasks and 3 GPUs.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
The code works with pc-type hypre on a single GPU.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Any suggestions are appreciated! </div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Below is the error trace:</div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">``</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">TACC:  Starting up job 1490124</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">TACC:  Setting up parallel environment for MVAPICH2+mpispawn.</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">TACC:  Starting parallel tasks...</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: ------------------------------------------------------------------------</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
 probably memory access out of range</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: or see <a href="https://petsc.org/release/faq/#valgrind" target="_blank">https://petsc.org/release/faq/#valgrind</a> and <a href="https://petsc.org/release/faq/" target="_blank">https://petsc.org/release/faq/</a></span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: or try <a href="https://docs.nvidia.com/cuda/cuda-memcheck/index.html" target="_blank">https://docs.nvidia.com/cuda/cuda-memcheck/index.html</a>
 on NVIDIA CUDA systems to find memory corruption errors</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: The line numbers in the error traceback are not always
 exact.</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #1 hypre_ParCSRMatrixMigrate()</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #2 MatBindToCPU_HYPRE() at /work/06368/annayesy/ls6/petsc/src/mat/impls/hypre/mhypre.c:1394</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #3 MatAssemblyEnd_HYPRE() at /work/06368/annayesy/ls6/petsc/src/mat/impls/hypre/mhypre.c:1471</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #4 MatAssemblyEnd() at /work/06368/annayesy/ls6/petsc/src/mat/interface/matrix.c:5773</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #5 MatConvert_AIJ_HYPRE() at /work/06368/annayesy/ls6/petsc/src/mat/impls/hypre/mhypre.c:660</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #6 MatConvert() at /work/06368/annayesy/ls6/petsc/src/mat/interface/matrix.c:4421</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #7 PCSetUp_HYPRE() at /work/06368/annayesy/ls6/petsc/src/ksp/pc/impls/hypre/hypre.c:245</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #8 PCSetUp() at /work/06368/annayesy/ls6/petsc/src/ksp/pc/interface/precon.c:1080</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #9 KSPSetUp() at /work/06368/annayesy/ls6/petsc/src/ksp/ksp/interface/itfunc.c:415</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #10 KSPSolve_Private() at /work/06368/annayesy/ls6/petsc/src/ksp/ksp/interface/itfunc.c:833</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">[0]PETSC ERROR: #11 KSPSolve() at /work/06368/annayesy/ls6/petsc/src/ksp/ksp/interface/itfunc.c:1080</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0</span></div>
<div style="text-align:left;margin:0px"><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">``</span></div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Below is a minimum working example:</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
``</div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">import numpy,petsc4py,sys,time</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">petsc4py.init(sys.argv)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">from petsc4py import PETSc</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">from time import time</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">n     = int(5e5);</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">comm  = PETSc.COMM_WORLD</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">pA = PETSc.Mat(comm=comm)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">pA.create(comm=comm)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">pA.setSizes((n,n))</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">pA.setType(PETSc.Mat.Type.AIJ)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">pA.setPreallocationNNZ(3)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">rstart,rend=pA.getOwnershipRange()</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">print("\t Processor %d of %d gets indices %d:%d"%(comm.Get_rank(),comm.Get_size(),rstart,rend))</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">if (rstart == 0):</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">    pA.setValue(0,0,2); pA.setValue(0,1,-1)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">if (rend == n):</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">    pA.setValue(n-1,n-2,-1); pA.setValue(n-1,n-1,2)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">for index in range(rstart,rend):</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">    if (rstart > 0):</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">        pA.setValue(index,index-1,-1)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">    pA.setValue(index,index,2)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">    if (rend < n):</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">        pA.setValue(index,index+1,-1)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">pA.assemble()</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">pA = pA.convert(mat_type='aijcusparse')</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">px,pb = pA.createVecs()</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">pb.set(1.0); px.set(1.0)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">ksp = PETSc.KSP().create()</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">ksp.setOperators(pA)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">ksp.setConvergenceHistory()</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">ksp.setType('cg')</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">ksp.getPC().setType('hypre')    </span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">ksp.setTolerances(rtol=1e-10)</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)"><br>
</span></div>
<div><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">ksp.solve(pb, px)                           </span><span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:16px;color:rgb(0,0,0);background-color:rgb(255,255,255)">#
 error is generated here</span></div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
``</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Best,</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Anna</div>
</div>

</div></blockquote></div>