[Nek5000-users] Parallel speedup on supercomputer Tianhe-2

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Tue Dec 6 09:34:04 CST 2016


That'a reasonable. The core compete for shared resources (L3 and DDR). 

> On 6 Dec 2016, at 09:25, nek5000-users at lists.mcs.anl.gov wrote:
> 
> Dear Stefan,
> 
> Thank you for your reply. There are 139056 elements and the polynomial order is 7 (lx1=8).  I measure the solver time to compute the speedup. For example, the serial job takes 2564.59s and the same job with 24 tasks takes 302.18s. The speedup is about 8.5. This is on a single Tianhe-2 node. The speedup between nodes is quite good.
> 
> I also test the code on my 36-core computer (Dual Xeon E5 18-Core). I can only get about 12 times speedup when I use 36 tasks. It is also about 1/3.
> 
> Best regards,
> Wei XU
> 
> 
> From: nek5000-users at lists.mcs.anl.gov
> To: <nek5000-users at lists.mcs.anl.gov>
> Subject: Re: [Nek5000-users] Parallel speedup on supercomputer
>         Tianhe-2
> Message-ID:
>         <mailman.7164.1480933423.3602.nek5000-users at lists.mcs.anl.gov>
> Content-Type: text/plain; charset="utf-8"
> 
> What's your problem size (number of elements and polynomial order)?
> 
> Let's assume t_MPI << t (this holds if your problem size is reasonably large). Even in this limit you don't get a linear intra-node speedup simply because Nek5000 is not purely compute bound and the cumulative memory bandwidth is saturated with N cores (N < total number of cores).
> 
> Cheers,
> 
> Stefan
> 
> 
> From: <nek5000-users-bounces at lists.mcs.anl.gov> on behalf of <nek5000-users at lists.mcs.anl.gov>
> Reply-To: <nek5000-users at lists.mcs.anl.gov>
> Date: Monday, December 5, 2016 at 6:12 AM
> To: <nek5000-users at lists.mcs.anl.gov>
> Subject: [Nek5000-users] Parallel speedup on supercomputer Tianhe-2
> 
> Dear Neks,
> 
> I'm using Nek5000 to simulate turbulent Rayleigh-Benard convection, which is governed by the coupled Navier-Stokes equations and convective heat equation. I'm running the code on a supercomputer, Tianhe-2, located in Guangzhou, China. Each computer node in Tianhe-2 has 24 cores (2 Xeon E5 12-core CPUs) and 64GB memory. I find the speedup curve is not linear on a single node. For example, a 24-task job is only 8 times faster than the serial one. However, the performance with an increasing number of nodes is quite good. I don't know whether there is any parameter in nek500 that I can change in order to improve the speedup performance of the individual nodes.
> 
> Thanks in advance!
> 
> Best regards,
> 
> Wei XU
> 
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20161206/17f187d4/attachment.html>


More information about the Nek5000-users mailing list