[petsc-dev] VecMDot_SeqCUSP improved
Paul Mullowney
paulm at txcorp.com
Tue Mar 26 09:35:16 CDT 2013
Karli,
Thanks for doing this. Once I get my current PETSc fork into next, I
will test this asap. I have a couple examples using GMRES and I hope
this will make a big difference. I'll let you know what I find.
-Paul
> Hi Jose, Paul, and others,
>
> I worked today and VecMDot and came up with an implementation which is
> faster than an iterated application of the standard cusp::blas::dot()
> (which, if I'm not mistaken, just forwards to CUBLAS) if enough
> vectors (>~6) are involved. For complex arithmetic, an iterated
> application of cusp::blas::dotc() is used, since passing complex types
> to CUDA kernels is fairly tricky within PETSc. Jose, any performance
> feedback from within SLEPc is appreciated :-)
>
> The new implementation is based on custom kernels, only allocates a
> little scratchpad memory and is thus more memory efficient than the
> old version. Also, any unnecessary copying of data is avoided. This
> should speed up GMRES quite a bit, yet I haven't run any dedicated
> GMRES benchmarks. Paul, I guess you have some samples at hand, don't you?
>
> Best regards,
> Karli
More information about the petsc-dev
mailing list