Dave,<br><br>That will probably not be a very good idea due to the overhead associated with transferring data to and from the GPU being more expensive than the computation itself for small problems. This issue can be somewhat avoided by writing trivial wrappers for routines like dgemm which only run the multiply on the GPU when the dimensions of the problem are above some threshold, but this would require slightly more work than simply replacing BLAS with CUBLAS.<br>

<br>Jack<br><br><div class="gmail_quote">On Fri, Feb 24, 2012 at 8:28 PM, Dave Nystrom <span dir="ltr"><<a href="mailto:Dave.Nystrom@tachyonlogic.com">Dave.Nystrom@tachyonlogic.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I was wondering if anyone had ever tried using cuBlas as a substitute for<br>

something like MKL with PETSc.  I've been wondering if it would give better<br>

performance than MKL for my direct solves with cholmod even though the block<br>

sizes are small for cholmod i.e. 32x32 is the default I believe.  If so, were<br>

there any tricky aspects to using cuBlas in this way?<br>

<br>

Thanks,<br>

<br>

Dave<br>

</blockquote></div><br>