[petsc-dev] Using cuBlas as the vendor blas for PETSc
Dave Nystrom
Dave.Nystrom at tachyonlogic.com
Fri Feb 24 21:25:52 CST 2012
Hi Jack,
Thanks for your comments. I had not thought of the idea of a wrapper. The
idea of the overhead with small blocks is certainly worrisome. I have not
really played around with blas much in a long time and so don't really have
an idea of where the breakeven size might be. I might play around with this
at some point just to satisfy my curiosity.
Thanks again,
Dave
Jack Poulson writes:
> Dave,
>
> That will probably not be a very good idea due to the overhead associated
> with transferring data to and from the GPU being more expensive than the
> computation itself for small problems. This issue can be somewhat avoided
> by writing trivial wrappers for routines like dgemm which only run the
> multiply on the GPU when the dimensions of the problem are above some
> threshold, but this would require slightly more work than simply replacing
> BLAS with CUBLAS.
>
> Jack
>
> On Fri, Feb 24, 2012 at 8:28 PM, Dave Nystrom <Dave.Nystrom at tachyonlogic.com
> > wrote:
>
> > I was wondering if anyone had ever tried using cuBlas as a substitute for
> > something like MKL with PETSc. I've been wondering if it would give better
> > performance than MKL for my direct solves with cholmod even though the
> > block
> > sizes are small for cholmod i.e. 32x32 is the default I believe. If so,
> > were
> > there any tricky aspects to using cuBlas in this way?
> >
> > Thanks,
> >
> > Dave
> >
More information about the petsc-dev
mailing list