[petsc-users] overlap cpu and gpu?

Mark Adams mfadams at lbl.gov
Sun Aug 2 07:09:34 CDT 2020


I suspect that the Poisson and Ampere's law solve are not coupled. You
might be able to duplicate the communicator and use two threads. You would
want to configure PETSc with threadsafty and threads and I think it
could/should work, but this mode is never used by anyone.

That said, I would not recommend doing this unless you feel like playing in
computer science, as opposed to doing application science. The best case
scenario you get a speedup of 2x. That is a strict upper bound, but you
will never come close to it. Your hardware has some balance of CPU to GPU
processing rate. Your application has a balance of volume of work for your
two solves. They have to be the same to get close to 2x speedup and that
ratio(s) has to be 1:1. To be concrete, from what little I can guess about
your applications let's assume that the cost of each of these two solves is
about the same (eg, Laplacians on your domain and the best case scenario).
But, GPU machines are configured to have roughly 1-10% of capacity in the
GPUs, these days, that gives you an upper bound of about 10% speedup. That
is noise. Upshot, unless you configure your hardware to match this problem,
and the two solves have the same cost, you will not see close to 2x
speedup. Your time is better spent elsewhere.

Mark

On Sat, Aug 1, 2020 at 3:24 PM Jed Brown <jed at jedbrown.org> wrote:

> You can use MPI and split the communicator so n-1 ranks create a DMDA for
> one part of your system and the other rank drives the GPU in the other
> part.  They can all be part of the same coupled system on the full
> communicator, but PETSc doesn't currently support some ranks having their
> Vec arrays on GPU and others on host, so you'd be paying host-device
> transfer costs on each iteration (and that might swamp any performance
> benefit you would have gotten).
>
> In any case, be sure to think about the execution time of each part.  Load
> balancing with matching time-to-solution for each part can be really hard.
>
>
> Barry Smith <bsmith at petsc.dev> writes:
>
> >   Nicola,
> >
> >     This is really viable or practical at this time with PETSc. It is
> not impossible but requires careful coding with threads, another
> possibility is to use one half of the virtual GPUs for each solve, this is
> also not trivial. I would recommend first seeing what kind of performance
> you can get on the GPU for each type of solve and revist this idea in the
> future.
> >
> >    Barry
> >
> >
> >
> >
> >> On Jul 31, 2020, at 9:23 AM, nicola varini <nicola.varini at gmail.com>
> wrote:
> >>
> >> Hello, I would like to know if it is possible to overlap CPU and GPU
> with DMDA.
> >> I've a machine where each node has 1P100+1Haswell.
> >> I've to resolve Poisson and Ampere equation for each time step.
> >> I'm using 2D DMDA for each of them. Would be possible to compute
> poisson
> >> and ampere equation at the same time? One on CPU and the other on GPU?
> >>
> >> Thanks
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200802/04703646/attachment.html>


More information about the petsc-users mailing list