[Nek5000-users] Parallel speedup on supercomputer Tianhe-2

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Tue Dec 6 02:25:22 CST 2016


Dear Stefan,

Thank you for your reply. There are 139056 elements and the polynomial
order is 7 (lx1=8).  I measure the solver time to compute the speedup. For
example, the serial job takes 2564.59s and the same job with 24 tasks takes
302.18s. The speedup is about 8.5. This is on a single Tianhe-2 node. The
speedup between nodes is quite good.

I also test the code on my 36-core computer (Dual Xeon E5 18-Core). I can
only get about 12 times speedup when I use 36 tasks. It is also about 1/3.

Best regards,
Wei XU


From: nek5000-users at lists.mcs.anl.gov
To: <nek5000-users at lists.mcs.anl.gov>
Subject: Re: [Nek5000-users] Parallel speedup on supercomputer
        Tianhe-2
Message-ID:
        <mailman.7164.1480933423.3602.nek5000-users at lists.mcs.anl.gov>
Content-Type: text/plain; charset="utf-8"

What's your problem size (number of elements and polynomial order)?

Let's assume t_MPI << t (this holds if your problem size is reasonably
large). Even in this limit you don't get a linear intra-node speedup simply
because Nek5000 is not purely compute bound and the cumulative memory
bandwidth is saturated with N cores (N < total number of cores).

Cheers,

Stefan


From: <nek5000-users-bounces at lists.mcs.anl.gov> on behalf of <
nek5000-users at lists.mcs.anl.gov>
Reply-To: <nek5000-users at lists.mcs.anl.gov>
Date: Monday, December 5, 2016 at 6:12 AM
To: <nek5000-users at lists.mcs.anl.gov>
Subject: [Nek5000-users] Parallel speedup on supercomputer Tianhe-2

Dear Neks,

I'm using Nek5000 to simulate turbulent Rayleigh-Benard convection, which
is governed by the coupled Navier-Stokes equations and convective heat
equation. I'm running the code on a supercomputer, Tianhe-2, located in
Guangzhou, China. Each computer node in Tianhe-2 has 24 cores (2 Xeon E5
12-core CPUs) and 64GB memory. I find the speedup curve is not linear on a
single node. For example, a 24-task job is only 8 times faster than the
serial one. However, the performance with an increasing number of nodes is
quite good. I don't know whether there is any parameter in nek500 that I
can change in order to improve the speedup performance of the individual
nodes.

Thanks in advance!

Best regards,

Wei XU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20161206/4d47c832/attachment.html>


More information about the Nek5000-users mailing list