[Nek5000-users] Parallel speedup on supercomputer Tianhe-2

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Tue Dec 13 05:30:50 CST 2016


Typically people look at inter-nocde scaling when they talk about parallel performance. Intra-node scaling (the case you look at) is a different story. Let’s consider two extremes:

1. STREAM triad benchmark is purely memory bound. On E5-2570v3 (Intel Haswell) you’ll get ~40 GB/sec using 2 cores (1 out of 18 cores per socket). If this would scale in a linear way we should see ~720GB/s using all 36 cores. However, in reality you'll get an aggregated peak bandwith ~90/sec. So it does _not_ scale linearly. In fact it levels off after 12 cores (6 cores per socket).

2. DGEMM benchmark is purely memory bound. On E5-2570v3 (Intel Haswell) you’ll get ~80% total peak floating points performance using all 36 cores. It scales more or less in a linear way with the number of cores.  

Nek5000 (like _all_ other PDE solvers) is somewhere in between.
On most systems we'll get 5-20% of peak depending on the hardware architecture and polynomial order. Compare this with other PDE based solvers.

Cheers,
Stefan
 
-----Original message-----
> From:nek5000-users at lists.mcs.anl.gov <nek5000-users at lists.mcs.anl.gov>
> Sent: Tuesday 13th December 2016 11:48
> To: nek5000-users at lists.mcs.anl.gov
> Subject: Re: [Nek5000-users] Parallel speedup on supercomputer Tianhe-2
> 
> Dear Stefan,
> 
> Sorry, I didnt make myself clear. What I meant is whether we can further improve the parallel computing performance, as I am only using about 1/3 of the computing power of each node. Thank you very much!
> 
> Best regards,
> Wei XU
> 
> Date: Wed, 07 Dec 2016 12:01:42 +0100<br style="font-size:12.8px" />From: nek5000-users at lists.mcs.anl.gov <mailto:nek5000-users at lists.mcs.anl.gov><br style="font-size:12.8px" />To: <nek5000-users at lists.mcs.anl.gov <mailto:nek5000-users at lists.mcs.anl.gov>><br style="font-size:12.8px" />Subject: Re: [Nek5000-users] Parallel speedup on supercomputer<br style="font-size:12.8px" />        Tianhe-2<br style="font-size:12.8px" />Message-ID:<br style="font-size:12.8px" />        <mailman.7291.1481108507.3602.nek5000-users at lists.mcs.anl.gov <mailto:mailman.7291.1481108507.3602.nek5000-users at lists.mcs.anl.gov>><br style="font-size:12.8px" />Content-Type: text/plain; charset="utf-8"<br style="font-size:12.8px" /><br style="font-size:12.8px" />Why do think something is wrong i.e. you have a problem?<br style="font-size:12.8px" /><br style="font-size:12.8px" />From: <nek5000-users-bounces at lists.mcs.anl.gov <mailto:ne
 k5000-us
 ers-bounces at lists.mcs.anl.gov>> on behalf of <nek5000-users at lists.mcs.anl.gov <mailto:nek5000-users at lists.mcs.anl.gov>><br style="font-size:12.8px" />Reply-To: <nek5000-users at lists.mcs.anl.gov <mailto:nek5000-users at lists.mcs.anl.gov>><br style="font-size:12.8px" />Date: Wednesday, December 7, 2016 at 10:39 AM<br style="font-size:12.8px" />To: <nek5000-users at lists.mcs.anl.gov <mailto:nek5000-users at lists.mcs.anl.gov>><br style="font-size:12.8px" />Subject: Re: [Nek5000-users] Parallel speedup on supercomputer Tianhe-2<br style="font-size:12.8px" /><br style="font-size:12.8px" />Dear Stefan,<br style="font-size:12.8px" /><br style="font-size:12.8px" />Thank you very much! Can I do anything on the Nek5000 side to reduce the problem?<br style="font-size:12.8px" /><br style="font-size:12.8px" />Best regards<br style="font-size:12.8px" />Wei XU<br style="font-size:12.8px" /><br style="font-size:12.8px" />Date: Tue, 6 Dec 2016 16:34:04 +0100<br style="f
 ont-size
 :12.8px" />From: nek5000-users at lists.mcs.anl.gov <mailto:nek5000-users at lists.mcs.anl.gov><br style="font-size:12.8px" />To: nek5000-users at lists.mcs.anl.gov <mailto:nek5000-users at lists.mcs.anl.gov><br style="font-size:12.8px" />Subject: Re: [Nek5000-users] Parallel speedup on supercomputer<br style="font-size:12.8px" />        Tianhe-2<br style="font-size:12.8px" />Message-ID:<br style="font-size:12.8px" />        <mailman.7242.1481038500.3602.nek5000-users at lists.mcs.anl.gov <mailto:mailman.7242.1481038500.3602.nek5000-users at lists.mcs.anl.gov>><br style="font-size:12.8px" />Content-Type: text/plain; charset="us-ascii"<br style="font-size:12.8px" /><br style="font-size:12.8px" />Thata reasonable. The core compete for shared resources (L3 and DDR).<br style="font-size:12.8px" /><br style="font-size:12.8px" />_______________________________________________ Nek5000-users mailing list Nek5000-users at lists.
 mcs.anl.
 gov <mailto:Nek5000-users at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/nek5000-users <https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users><br style="font-size:12.8px" /><br style="font-size:12.8px" />-------------- next part --------------<br style="font-size:12.8px" />An HTML attachment was scrubbed...<br style="font-size:12.8px" />URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20161207/c5b7a050/attachment-0001.html <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20161207/c5b7a050/attachment-0001.html>><br style="font-size:12.8px" /><br style="font-size:12.8px" />------------------------------
> 
> Date: Tue, 6 Dec 2016 16:25:22 +0800<br style="font-size:12.8px" />From: nek5000-users at lists.mcs.anl.gov <mailto:nek5000-users at lists.mcs.anl.gov><br style="font-size:12.8px" />To: nek5000-users at lists.mcs.anl.gov <mailto:nek5000-users at lists.mcs.anl.gov><br style="font-size:12.8px" />Subject: Re: [Nek5000-users] Parallel speedup on supercomputer<br style="font-size:12.8px" />        Tianhe-2<br style="font-size:12.8px" />Message-ID:<br style="font-size:12.8px" />        <mailman.7221.1481012750.3602.nek5000-users at lists.mcs.anl.gov <mailto:mailman.7221.1481012750.3602.nek5000-users at lists.mcs.anl.gov>><br style="font-size:12.8px" />Content-Type: text/plain; charset="utf-8"<br style="font-size:12.8px" /><br style="font-size:12.8px" />Dear Stefan,<br style="font-size:12.8px" /><br style="font-size:12.8px" />Thank you for your reply. There are 139056 elements and the polynomial<br style="font-size:12.8px" 
 />order 
 is 7 (lx1=8).  I measure the solver time to compute the speedup. For<br style="font-size:12.8px" />example, the serial job takes 2564.59s and the same job with 24 tasks takes<br style="font-size:12.8px" />302.18s. The speedup is about 8.5. This is on a single Tianhe-2 node. The<br style="font-size:12.8px" />speedup between nodes is quite good.<br style="font-size:12.8px" /><br style="font-size:12.8px" />I also test the code on my 36-core computer (Dual Xeon E5 18-Core). I can<br style="font-size:12.8px" />only get about 12 times speedup when I use 36 tasks. It is also about 1/3.<br style="font-size:12.8px" /><br style="font-size:12.8px" />Best regards,<br style="font-size:12.8px" />Wei XU
> 
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


More information about the Nek5000-users mailing list