[Nek5000-users] Does Nek5000 use any kind of parallelism inside one element?

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Mon Feb 1 08:37:02 CST 2016


Hi Lars,

Nek5000 doesn't implement any sort of parallelism within the elements.
Typical element sizes are 8x8x8 or 12x12x12, so that hard coarse
granularity limit is generally less restrictive than typical communication
vs computation cost limitations.  Running with many elements per MPI rank
is even better! though this is not the common case.  The other main
advantage of OpenMP is the ability to share data, but Nek5000 has [almost]
no shared state, so its not particularly relevant.

The matrix multiplies are handled by compiled code that unrolls the "k"
index.  See:
https://github.com/Nek5000/Nek5000/blob/master/mxm_std.f
The matrices are generally too small for MKL: the overhead, from argument
checking for example, wipes out any gain from better instructions.  There
is current work on integrating with LIBXSMM, which provides better
performance in flop-bound cases.

Hope that helps,
Max

On Mon, Feb 1, 2016 at 8:20 AM <nek5000-users at lists.mcs.anl.gov> wrote:

> Dear Nek5000 developpers and users,
>
> my name is Lars Haupt and I'am writing my phd-thesis at the moment, which
> has something to do with scalable p-multigrid based approaches. Therefore i
> have to understand the state of the art multigrid solver Nek5000 and thats
> why i'am trying to get some insides into Nek5000 parallelism.
>
> I read the article about the scalability of the Nek5000 Code which was
> achieved on the Juelich Supercomputer in 2010.
> I thought that the impressive strong scaling speedup was a result of an
> optimised hybrid parallelization strategy (MPI and OpenMP).
>
> But the general info on the Nek5000 website says that only MPI does the
> parallel job. Which means, domain decomposition is the only kind of
> parallelism?
> What happens with the huge amount of nodes inside one element? The FAQ
> tells something about high efficient matrix-matrix multiplications. Which
> are implemented using MKL-BLAS 3 dgemm routines, i guess?
> If that is the case, does Nek5000 use any OpenMP based parallelism inside
> MKL or why not?
>
> Many thanks in advance
>
> Lars Haupt
>
> Dipl.-Math. Lars Haupt
> Wissenschaftlicher Mitarbeiter
>
> Technische Universität Dresden
> Fakultät Maschinenwesen
> Institut für Energietechnik
> Professur für Gebäudeenergietechnik und Wärmeversorgung
> 01062 Dresden
> Tel.: +49(351) 463-37619
> Fax.: +49(351) 463-37076
> E-Mail: lars.haupt at tu-dresden.de
>
>
>
>
>
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20160201/e2840a4b/attachment.html>


More information about the Nek5000-users mailing list