[petsc-dev] Performance of VecMDot_SeqCUSP

Jed Brown jedbrown at mcs.anl.gov
Tue Apr 24 14:42:28 CDT 2012


On Tue, Apr 24, 2012 at 14:29, Daniel Lowell <redratio1 at gmail.com> wrote:

> Launching smaller overlapping asynchronous kernels can have speed up if
> your vectors are large and you are doing reductions. This way warps stalls
> can be compensated for, and latencies can be hidden. Not sure what you mean
> "the way it currently is" though...


The reduction is only needed at the end. Any sequential launch adds
artificial synchronization. I'd be interested to see the performance
comparison, but I'd be surprised if independent kernel launches were faster
than a decent implementation with one kernel launch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120424/895b96f4/attachment.html>


More information about the petsc-dev mailing list