[petsc-users] Using PETSc with GPU

Fri Mar 15 21:06:29 CDT 2019

Yuyun Yang via petsc-users <petsc-users at mcs.anl.gov> writes:

> Currently we are forming the sparse matrices explicitly, but I think the goal is to move towards matrix-free methods and use a stencil, which I suppose is good to use GPUs for and more efficient. On the other hand, I've also read about matrix-free operations in the manual just on the CPUs. Would there be any benefit then to switching to GPU (looks like matrix-free in PETSc is rather straightforward to use, whereas writing the kernel function for GPU stencil would require quite a lot of work)?

It all depends what kind of computation happens in there and how well
you can implement it for the GPU.  It's important to have a clear idea
of what you expect to achieve.  For example, if you write an excellent
GPU implementation of your SNES residual/matrix-free Jacobian, it might
be 2-3x faster than a good CPU implementation on hardware of similar
cost ($ or Watt).  But you still need preconditioning, which is usually
at least half the work, and perhaps a preconditioner runs the same speed
on GPU and CPU (CPU version often converges a bit faster;
preconditioning operations are often less amenable to GPUs).  So after
all that effort, and now with code that is likely harder to maintain,
you go from 4 seconds per solve to 3 seconds per solve on hardware of
the same cost.  Is that worth it?

Maybe, but you probably want that to be in the critical path for your
research and/or customers.