[petsc-dev] Using cuBlas as the vendor blas for PETSc

Dave Nystrom Dave.Nystrom at tachyonlogic.com
Fri Feb 24 21:25:52 CST 2012


Hi Jack,

Thanks for your comments.  I had not thought of the idea of a wrapper.  The
idea of the overhead with small blocks is certainly worrisome.  I have not
really played around with blas much in a long time and so don't really have
an idea of where the breakeven size might be.  I might play around with this
at some point just to satisfy my curiosity.

Thanks again,

Dave

Jack Poulson writes:
 > Dave,
 > 
 > That will probably not be a very good idea due to the overhead associated
 > with transferring data to and from the GPU being more expensive than the
 > computation itself for small problems. This issue can be somewhat avoided
 > by writing trivial wrappers for routines like dgemm which only run the
 > multiply on the GPU when the dimensions of the problem are above some
 > threshold, but this would require slightly more work than simply replacing
 > BLAS with CUBLAS.
 > 
 > Jack
 > 
 > On Fri, Feb 24, 2012 at 8:28 PM, Dave Nystrom <Dave.Nystrom at tachyonlogic.com
 > > wrote:
 > 
 > > I was wondering if anyone had ever tried using cuBlas as a substitute for
 > > something like MKL with PETSc.  I've been wondering if it would give better
 > > performance than MKL for my direct solves with cholmod even though the
 > > block
 > > sizes are small for cholmod i.e. 32x32 is the default I believe.  If so,
 > > were
 > > there any tricky aspects to using cuBlas in this way?
 > >
 > > Thanks,
 > >
 > > Dave
 > >



More information about the petsc-dev mailing list