PETSc on GPUs

Wed Sep 17 22:51:52 CDT 2008

On Wed, 2008-09-17 at 21:33 -0500, Barry Smith wrote:
> Ahmed,
> 
>     This is very cool.
> On Sep 17, 2008, at 8:05 PM, Ahmed El Zein wrote:
> 
> > Harald,
> > I am working on implementing SpMV on a NVIDIA GPU with CUDA for SML
> > applications (just about finished actually).
> >
> > As the SML application I am planing to base my work on uses PETSc, I
> > have written some functions that will convert AIJ matrices and Vectors
> > to SP and copy them to the GPU, multiply them and copy them back to  
> > the
> > host. I would be happy to share them with you if you want.
> 
>     I think to really make the GPU truly a large step forward in  
> performance, the Vec's need to
> be kept on the GPU and only transported back to the main CPU when
> absolutely needed. For example, consider a KSP solver like CG, it would
> run on the main CPU but each Vec is actually just a handle for the  
> true vector
> entries that are in the GPU memory, a call to VecAXPY(), for example,  
> would
> pass the scalar and the two Vec handles down to the GPU where the actual
> axpy is performed. With this paradigm the only values passed back to the
> main CPU are scalars. This is why I think this work has to be done only
> on the latests GPU systems with lots of memory.
> 
You are right! That is what I do for the SML application. I convert and
copy the matrix to the GPU and then iteratively send a new vector to the
GPU for multiplication. In fact unless the matrix will be reused at
least 4 times, there would be no performance gain!

The 8800 GTX has 768 MB of memory but you can have multiple GPUs running
on your machine and split your data amongst them to effectively get more
memory.

What I was thinking of was to add GPU pointers to a PETSC Mat or Vec
object with an optional shadow parameter. If shadow is enabled the host
will keep a copy of what is on the GPU in main memory. That way if there
are changes to the original matrix:
1. It might be less expensive to make the modification on the host.
2. It might be possible to update the matrix on the GPU by sending
diffs, as the copying of data back and forth is the most expensive
operation.

It also will allow the use of that matrix on both the host and the GPU
to offer maximum flexibility. Matrix assembly would be done on the host
anyway.

(I am not sure if the above points are loads of rubbish or not. But I
think that there are many options to be considered in a PETSc GPU
implementation.)

A question that I had regarding the PETSc code when I was thinking about
this was:
You have the SeqAIJ matrix type and the the MPIAIJ type built around it
(or that is what I understand from the code). So basically you implement
the SeqAIJ type for the GPU and you get the MPI type for free?

Ahmed

>     Barry 
> 
> >
> >
> > While this would be outside the scope of my MPhil I would be very
> > interested in helping to add GPU support for PETSC. I have not yet had
> > any experience with CTM for programming ATI GPUs but I believe there
> > would not be a huge difference.
> >
> > I have access to a GeForce 8800GTX GPU (single precession only) at the
> > ANU. I have been talking with my supervisor about getting a GTX280 or
> > GTX 260 (supports double precision) but I don't know if we will be
> > getting one.
> >
> > Anyway I would like to help. So if anyone would like to start thinking
> > how this would be best implemented, I am available. :)
> >
> > Ahmed
> >
> > On Wed, 2008-09-17 at 08:24 -0700, Harald Pfeiffer wrote:
> >> Hi,
> >>
> >> do you know whether there are any efforts to run PETSc on GPUs
> >> (graphical processing units)?
> >>
> >> Thanks,
> >> Harald
> >>
> >>
> >>
> >