[petsc-users] Optimal memory layout for finite differences

Thu Dec 12 21:24:08 CST 2013

Hi Everyone,

Would it be a good idea to arrange the data in fastest direction in the
following manner for the ease of aligned loads and vector operations?

Total grid points = 4n
0, n, 2n, 3n, 1, n+1, 2n+1, 3n+1 and so on
Ref: "Tuning a Finite Difference Computation for Parallel Vector Processors"
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6341495

This change in the global memory layout would mix up the ghost zones in
Petscs' DMDAs and I guess change the matrix structure seperating adjacent
points by a distance = 4. One can even make the distance = 8 and load one
full cacheline in one go. I was wondering if this memory layout can be used
for computations using Petscs' DMDAs and if the preconditioners would be ok
with this kind of an arrangement.

Thanks,
Mani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131212/08f193ef/attachment.html>