[Nek5000-users] Important nek5000 optimizations
Paul Fischer
fischer at mcs.anl.gov
Wed Jul 8 05:46:26 CDT 2009
There are two optimizations of which you should all be aware:
(1) I've encounterd some SIZEu files that have the following
definition of the dealias array sizes:
parameter (lxd=1+3*lx1/2,lyd=lxd,lzd=lxd*(ldim-2)+(3-ldim))
In this case, we can have, for
lx1 = 6 lxd = 10
lx1 = 8 lxd = 13
lx1 = 10 lxd = 16
lx1 = 12 lxd = 19
Note that for lx1=8 and 12 we have lxd odd. On BG/P (and some,
but not all, other architectures past and future), having odd
dimensions for any of the SEM array sizes can be very detrimental
to performance because it means that the array accesses will not
be quad-aligned (or double-aligned for single precision cases).
Effective use of the BG/P (and BG/L and BG/Q) double-hummer requires
quad-aligned data. If lx1,lx2, and lxd are even numbers then data
alignment in Nek5000 is fairly assured.
Please check your SIZEu files if you are using the BG platforms.
(2) I have a uploaded a new genmap code to the repo for generation
of element-to-processor mappings. It appears to greatly improve
the partitions, particularly for large element/processor counts.
The old code would occasionally produce isolated subsets of elements
that would in turn lead to high communication overhead. If you are
using the XXt (default) coarse-grid solver, these new maps should also
improve XXt performance. If you are using the AMG-based coarse-grid
solver, you must regenerate all amg*dat files because the new genmap
produces a different vertex ordering than the old.
In my largest cases to date (3 million elements, 65000 processors),
I'm seeing a factor of two reduction in CPU time. (I formerly had
some very poor partition sets in this particular case...)
More information about the Nek5000-users
mailing list