[petsc-users] A series of GPU questions

Tue Jun 9 19:51:06 CDT 2020

On Tue, Jun 9, 2020 at 7:11 PM GIBB Gordon <g.gibb at epcc.ed.ac.uk> wrote:

> Hi,
>
> First of all, my apologies if this is not the appropriate list to send
> these questions to.
>
> I’m one of the developers of TPLS (https://sourceforge.net/projects/tpls/),
> a Fortran code that uses PETSc, parallelised using DM vectors. It uses a
> mix of our own solvers, and PETSc’s Krylov solvers. At present it has been
> run on up to 25,000 MPI processes, although larger problem sizes should be
> able to scale beyond that.
>
> With the awareness that more and more HPC machines now have one or more
> GPUs per node, and that upcoming machines that approach/achieve Exascale
> will be heterogeneous in nature, we are investigating whether it is worth
> using GPUs with TPLS, and if so, how to best do this.
>
> I see that in principle all we’d need to do to is set some flags as
> described at https://www.mcs.anl.gov/petsc/features/gpus.html to offload
> work onto the GPU, however I have some questions about doing this in
> practice:
>
> The GPU machine I have access to has nodes with two 20 core CPUs and 4
> NVIDIA GPUs (so 10 cores per GPU). We could use CUDA or OpenCL, and may
> well explore both of them. With TPLS being an MPI application, we would
> wish to use many processes (and nodes), not just a single process. How
> would we best split this problem up?
>
> Would we have 1 MPI process per GPU (so 4 per node), and then implement
> our own solvers either to also work on the GPU, or use OpenMP to make use
> of the 10 cores per GPU? If so, how would we specify to PETSc which GPU
> each process would use?
>
> Would we instead just have 40 (or perhaps slightly fewer) MPI processes
> all sharing the GPUs? Surely this would be inefficient, and would PETSc
> distribute the work across all 4 GPUs, or would every process end out using
> a single GPU?
>
See
https://docs.olcf.ornl.gov/systems/summit_user_guide.html#volta-multi-process-service.
In some cases, we did see better performance with multiple mpi ranks/GPU
than 1 rank/GPU. The optimal configuration depends on the code. Think two
extremes:  One code with work done all on GPU and the other all on CPU.
Probably you only need 1 mpi rank/node for the former, but full ranks for
the latter.

>
> Would the Krylov solvers be blocking whilst the GPUs are in use running
> the solvers, or would the host code be able to continue and carry out other
> calculations whilst waiting for the GPU code to finish? We may need to
> modify our algorithm to allow for this, but it would make sense to
> introduce some concurrency so that the CPUs aren’t idling whilst waiting
> for the GPUs to complete their work.
>
We use asynchronous kernel launch and split-phase communication
(VecScatterBegin/End). As long as there is no dependency, you can overlap
computations on CPU and GPU, or computations with communications.

>
> Finally, I’m trying to get the OpenCL PETSc to work on my laptop (Macbook
> Pro with discrete AMD Radeon R9 M370X GPU). This is mostly because our GPU
> cluster is out of action until at least late June and I want to get a head
> start on experimenting with GPUs and TPLS. When I try to run TPLS with the
> ViennaCL PETSc it reports that my GPU is unable to support double
> precision. I confirmed that my discrete GPU does support this, however my
> integrated GPU (Intel Iris) does not. I suspect that ViennaCL is using my
> integrated GPU instead of my discrete one (it is listed as GPU 0 by OpenCL,
> with the AMD card is GPU 1). Is there any way of getting PETSc to report
> which OpenCL device is in use, or to select which device to use? I saw
> there was some discussion about this on the mailing list archives but I
> couldn’t find any conclusion.
>
No experience. Karl Rupp (cc'ed) might know.

>
> Thanks in advance for your help,
>
> Regards,
>
> Gordon
>
> -----------------------------------------------
> Dr Gordon P S Gibb
> EPCC, The University of Edinburgh
> Tel: +44 131 651 3459
>
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200609/d5e76de0/attachment.html>