[petsc-users] Multi-DOF DMDA Vec
Jed Brown
jed at jedbrown.org
Fri Jun 13 21:09:34 CDT 2014
Anush Krishnan <anush at bu.edu> writes:
> With regard to the interlaced memory performing better: If I used three
> vectors created from the same DMDA for each degree of freedom, how
> different would that be in performance compared to a fully interlaced
> vector? Wouldn't cache reuse be about the same for both cases?
No, when you traverse the grid accessing all three components, you will
have three times as many prefetch streams (typically reducing prefetch
capability, thus generating more cold cache misses) and will spill
irregularly over cache lines more frequently, thus reducing the
effective cache size. This can result in an integer-factor slowdown as
compared to interlaced storage. By all means, run the experiment, but
the expected result for memory bandwidth/cache-limited operations is
that interlaced delivers significantly better performance.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140613/e16487b2/attachment.pgp>
More information about the petsc-users
mailing list