[Nek5000-users] Important nek5000 optimizations

Wed Jul 8 05:46:26 CDT 2009

There are two optimizations of which you should all be aware:

(1) I've encounterd some SIZEu files that have the following
definition of the dealias array sizes:

       parameter (lxd=1+3*lx1/2,lyd=lxd,lzd=lxd*(ldim-2)+(3-ldim))

In this case, we can have, for

       lx1 = 6         lxd = 10
       lx1 = 8         lxd = 13
       lx1 = 10        lxd = 16
       lx1 = 12        lxd = 19

Note that for lx1=8 and 12 we have lxd odd.   On BG/P (and some,
but not all, other architectures past and future), having odd 
dimensions for any of the SEM array sizes can be very detrimental
to performance because it means that the array accesses will not
be quad-aligned (or double-aligned for single precision cases).
Effective use of the BG/P (and BG/L and BG/Q) double-hummer requires
quad-aligned data.  If lx1,lx2, and lxd are even numbers then data
alignment in Nek5000 is fairly assured.

Please check your SIZEu files if you are using the BG platforms.

(2)  I have a uploaded a new genmap code to the repo for generation
of element-to-processor mappings.   It appears to greatly improve
the partitions, particularly for large element/processor counts. 
The old code would occasionally produce isolated subsets of elements 
that would in turn lead to high communication overhead.   If you are 
using the XXt (default) coarse-grid solver, these new maps should also 
improve XXt performance.   If you are using the AMG-based coarse-grid 
solver, you must regenerate all amg*dat files because the new genmap
produces a different vertex ordering than the old.

In my largest cases to date (3 million elements, 65000 processors),
I'm seeing a factor of two reduction in CPU time.  (I formerly had 
some very poor partition sets in this particular case...)