[petsc-users] Code (possibly) not running on GPU with CUDA

GIBB Gordon g.gibb at epcc.ed.ac.uk
Wed Aug 5 11:47:31 CDT 2020

Hi Matt,

It runs, however it doesn’t produce any output, and I have no way of checking to see if it actually ran on the GPU. It was run with:

srun -n 1 ./ex28 -vec_type cuda -use_gpu_aware_mpi 0



Dr Gordon P S Gibb
EPCC, The University of Edinburgh
Tel: +44 131 651 3459

On 5 Aug 2020, at 17:10, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

On Wed, Aug 5, 2020 at 11:24 AM GIBB Gordon <g.gibb at epcc.ed.ac.uk<mailto:g.gibb at epcc.ed.ac.uk>> wrote:

I’ve built PETSc with NVIDIA support for our GPU machine (https://cirrus.readthedocs.io/en/master/user-guide/gpu.html), and then compiled our executable against this PETSc (using version 3.13.3). I should add that the MPI on our system is not GPU-aware so I have to use -use_gpu_aware_mpi 0

When running this, in the .petscrc I put

-dm_vec_type cuda
-dm_mat_type aijcusparse

as is suggested on the PETSc GPU page (https://www.mcs.anl.gov/petsc/features/gpus.html) to enable CUDA for DMs (all our PETSc data structures are with DMs). I have also ensured I'm using the jacobi preconditioner so that it definitely runs on the GPU (again, according to the PETSc GPU page).

When I run this, I note that the GPU seems to have memory allocated on it from my executable, however seems to be doing no computation:

Wed Aug  5 13:10:23 2020
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla V100-SXM2...  On   | 00000000:1A:00.0 Off |                  Off |
| N/A   43C    P0    64W / 300W |    490MiB / 16160MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0     33712      C   .../z04/gpsgibb/TPLS/TPLS-GPU/./twophase.x   479MiB |

I then ran the same example but without the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments, and I found the same behaviour (479MB allocated on the GPU, 0% GPU utilisation).

In both cases the runtime of the example are near identical, suggesting that both are essentially the same run.

As a further test I compiled PETSc without CUDA support and ran the same example again, and found the same runtime as with the GPUs, and (as expected) no GPU memory allocated. I then tried to run the example with the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments and it ran without complaint. I would have expected it to throw an error or at least a warning if invalid arguments were passed to it.

All this suggests to me that PETSc is ignoring my requests to use the GPUs. For the GPU-aware PETSc it seems to allocate memory on the GPUs but perform no calculations on them, regardless of whether I requested it to use the GPUs or not. On non-GPU-aware PETSc it accepts my requests to use the GPUs, but does not throw an error.

What am I doing wrong?

Lets step back to a simpler thing so we can make sure your configuration is correct. Can you run the 2_cuda test from
src/vec/vec/tests/ex28.c ? Does it execute on your GPU?



Thanks in advance,

Dr Gordon P S Gibb
EPCC, The University of Edinburgh
Tel: +44 131 651 3459

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200805/5865715e/attachment-0001.html>

More information about the petsc-users mailing list