[petsc-dev] Performance of VecMDot_SeqCUSP

Aron Ahmadia aron.ahmadia at kaust.edu.sa
Tue Apr 24 14:44:20 CDT 2012


I'm interested in seeing this too, especially if somebody can explain the
results after they've been demonstrated :)

A

On Tue, Apr 24, 2012 at 10:42 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> On Tue, Apr 24, 2012 at 14:29, Daniel Lowell <redratio1 at gmail.com> wrote:
>
>> Launching smaller overlapping asynchronous kernels can have speed up if
>> your vectors are large and you are doing reductions. This way warps stalls
>> can be compensated for, and latencies can be hidden. Not sure what you mean
>> "the way it currently is" though...
>
>
> The reduction is only needed at the end. Any sequential launch adds
> artificial synchronization. I'd be interested to see the performance
> comparison, but I'd be surprised if independent kernel launches were faster
> than a decent implementation with one kernel launch.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120424/ec372597/attachment.html>


More information about the petsc-dev mailing list