PETSc on GPUs
Harald Pfeiffer
harald at tapir.caltech.edu
Thu Sep 25 22:44:27 CDT 2008
Hi,
I ran across another cool way to speed up Petsc on GPUs:
GPUs have way faster single precision FLOPS counts than double
precision. For iterative solvers, the update-computation can easily be
single precision. But that's where all the work is. For example, one
could run Newton-Raphson such that:
- the non-linear residual is computed in double precision.
- the inversion of the Jacobian is done in single precision.
One gets the single precision FLOPS count while keeping double-precision
accuracy in the end result.
Harald
For iterative solvers, run the update-computation in *single* precision.
On 09/18/2008 02:09 PM, Barry Smith wrote:
>
> In implementing Vec in the GPU a VecScatter would take the entries
> directly from GPU copy of the
> vector and put them appropriately into the GPU copy of the other vector,
> thus it would be completely
> transparent from the MatMult_MPIAIJ() code. In other words, if you have
> to modify the MatMult_MPIAIJ
> IN ANYWAY for the GPU then there is something wrong.
>
> Barry
>
> Note the VecScatter would need to marshal the values up from the GPU
> into the regular memory,
> do the appropriate message passing to the resulting processes and then
> marshal the values back
> down to the destination GPU. Since the current VecScatter handles all
> the message passing I think
> the GPU VecScatter() could be written to 1) move the needed GPU values
> up to regular memory
> (inside work MPI Vecs) then 2) use a regular VecScatter on those work
> vectors then 3) move the results
> down from the regular memory to the GPU.
>
> In other words PETSc has been designed since 1991 to be able to reuse
> almost all the infrastructure
> and only need new low level kernels written to run on GPUs. :-)
>
> Fun stuff, sure beats working :-)
>
>
>
> On Sep 18, 2008, at 3:18 PM, Richard Tran Mills wrote:
>
>> Ahmed,
>>
>> If you take a look in
>>
>> src/mat/impls/aij/mpi/
>>
>> at mpiaij.h and mpiaij.c, you can see that an MPIAIJ matrix is stored
>> locally as two "sequential" matrics, A and B. A contains the
>> "diagonal" portion of the matrix, and B contains the "off-diagonal"
>> portion. See page 59 of the PETSc User manual for a quick
>> illustration of how this partitioning works. Because vectors are
>> partitioned row-wise among processors in the same manner as the
>> matrix, each process already has the vector entries required to form
>> the portion of the matrix-vector product ("mat-vec") involving the
>> diagonal portion 'A' can be done with data that reside locally. The
>> off-diagonal portion 'B', however, requires gathering vector entries
>> that reside off-processor. If you look in mpiaij.c at the
>> MatMult_MPIAIJ, you will see the following lines:
>>
>> ierr =
>> VecScatterBegin(a->Mvctx,xx,a->lvec,INSERT_VALUES,SCATTER_FORWARD);CHKERRQ(ierr);
>>
>> ierr = (*a->A->ops->mult)(a->A,xx,yy);CHKERRQ(ierr);
>> ierr =
>> VecScatterEnd(a->Mvctx,xx,a->lvec,INSERT_VALUES,SCATTER_FORWARD);CHKERRQ(ierr);
>>
>> ierr = (*a->B->ops->multadd)(a->B,a->lvec,yy,yy);CHKERRQ(ierr);
>>
>> The mat-vec involving a->A can be completed without any VecScatter,
>> but the one involving a->B needs the VecScatter to complete before it
>> can be done. (Note that the VecScatterBegin() occurs before the
>> mat-vec for a->A so that there might be some overlap of communication
>> and computation).
>>
>> Hopefully this helps elucidate what Matt meant, and didn't just
>> confuse you.
>>
>> --Richard
>>
>> Ahmed El Zein wrote:
>>> On Thu, 2008-09-18 at 10:20 -0500, Matthew Knepley wrote:
>>>>> A question that I had regarding the PETSc code when I was thinking
>>>> about
>>>>> this was:
>>>>> You have the SeqAIJ matrix type and the the MPIAIJ type built around
>>>> it
>>>>> (or that is what I understand from the code). So basically you
>>>> implement
>>>>> the SeqAIJ type for the GPU and you get the MPI type for free?
>>>> Yes, that is true. However, note that in the MPI step, you will need a
>>>> gather
>>>> operation to get the second matrix multiply to work.
>>> Matt,
>>> Could you explain a bit more?
>>> Ahmed
>>
>> --
>> Richard Tran Mills, Ph.D. | E-mail: rmills at climate.ornl.gov
>> Computational Scientist | Phone: (865) 241-3198
>> Computational Earth Sciences Group | Fax: (865) 574-0405
>> Oak Ridge National Laboratory | http://climate.ornl.gov/~rmills
>>
--
Harald P. Pfeiffer harald at tapir.caltech.edu
Theoretical Astrophysics Phone (626) 395-8413
Caltech 130-33 Fax (626) 796-5675
Pasadena, CA 91125, USA
More information about the petsc-dev
mailing list