[petsc-users] CPU vs GPU for PETSc applications

Jed Brown jed at jedbrown.org
Fri Mar 11 07:10:50 CST 2016


Justin Chang <jychang48 at gmail.com> writes:

> Matt,
>
> So what's an example of "doing a bunch of iterations to make sending the
> initial datadown worth it"? 

CG/Jacobi for a high resolution problem.  You pretty much have to have
thrown in the towel on finding a good preconditioner, otherwise you'd be
at risk of solving the problem too quickly.  Some groups have shown
acceptable multigrid performance, though it's a tough sell if you're
paying for the coprocessor.

One problem with the 3x bandwidth difference is that GPU algorithms
often require temporaries or multiple passes over the date where a CPU
would be able to do a single pass with little or no temporaries.  In
finite element computations, and also some sparse matrix operations,
those intermediate quantities can more than squander the apparent
bandwidth advantage.

> Is there a correlation between that and arithmetic intensity, where an
> application is likely to be more compute-bound and memory-bandwidth
> bound?

Not really because each iteration accesses the entire sparse matrix.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160311/fd7e3087/attachment.pgp>


More information about the petsc-users mailing list