[petsc-users] Help with input construction hang on 2-GPU CG Solve

Rohan Yadav rohany at alumni.cmu.edu
Fri Dec 16 02:03:58 CST 2022


Hi,

I'm developing a microbenchmark that runs a CG solve using PETSc on a mesh
using a 5-point stencil matrix. My code (linked here:
https://github.com/rohany/petsc-pde-benchmark/blob/main/main.cpp, only 120
lines) works on 1 GPU and has great performance. When I move to 2 GPUs, the
program appears to get stuck in the input generation. I've literred the
code with print statements and have found out the following clues:

* The first rank progresses through this loop:
https://github.com/rohany/petsc-pde-benchmark/blob/main/main.cpp#L44, but
then does not exit (it seems to get stuck right before rowStart == rowEnd)
* The second rank makes very few iterations through the loop for its
allotted rows.

Therefore, neither rank makes it to the call to MatAssemblyBegin.

I'm running the code using the following command line on the Summit
supercomputer:
```
jsrun -n 2 -g 1 -c 1 -b rs -r 2
/gpfs/alpine/scratch/rohany/csc335/petsc-pde-benchmark/main -ksp_max_it 200
-ksp_type cg -pc_type none -ksp_atol 1e-10 -ksp_rtol 1e-10 -vec_type cuda
-mat_type aijcusparse -use_gpu_aware_mpi 0 -nx 8485 -ny 8485
```

Any suggestions will be appreciated! I feel that I have applied many of the
common petsc optimizations of preallocating my matrix row counts, so I'm
not sure what's going on with this input generation.

Thanks,

Rohan Yadav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221216/8e0396e4/attachment-0001.html>


More information about the petsc-users mailing list