[petsc-users] Understanding the memory bandwidth
Jed Brown
jed at jedbrown.org
Thu Aug 13 13:04:27 CDT 2015
Justin Chang <jychang48 at gmail.com> writes:
> Hi all,
>
> According to our University's HPC cluster (Intel Xeon E5-2680v2
> <http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%20E5-2680%20v2.html>), the
> online specifications says I should have a maximum BW of 59.7 GB/s. I am
> guessing this number is computed by 1866 MHz * 8 Bytes * 4 memory channels.
Yup, per socket.
> Now, when I run the STREAMS Triad benchmark on a single compute node (two
> sockets, 10 cores each, 64 GB total memory), on up to 20 processes with
> MPICH, i get the following:
>
> $ mpiexec -n 1 ./MPIVersion:
> Triad: 13448.6701 Rate (MB/s)
>
> $ mpiexec -n 2 ./MPIVersion:
> Triad: 24409.1406 Rate (MB/s)
>
> $ mpiexec -n 4 ./MPIVersion
> Triad: 31914.8087 Rate (MB/s)
>
> $ mpiexec -n 6 ./MPIVersion
> Triad: 33290.2676 Rate (MB/s)
>
> $ mpiexec -n 8 ./MPIVersion
> Triad: 33618.2542 Rate (MB/s)
>
> $ mpiexec -n 10 ./MPIVersion
> Triad: 33730.1662 Rate (MB/s)
>
> $ mpiexec -n 12 ./MPIVersion
> Triad: 40835.9440 Rate (MB/s)
>
> $ mpiexec -n 14 ./MPIVersion
> Triad: 44396.0042 Rate (MB/s)
>
> $ mpiexec -n 16 ./MPIVersion
> Triad: 54647.5214 Rate (MB/s) *
>
> $ mpiexec -n 18 ./MPIVersion
> Triad: 57530.8125 Rate (MB/s) *
>
> $ mpiexec -n 20 ./MPIVersion
> Triad: 42388.0739 Rate (MB/s) *
>
> The * numbers fluctuate greatly each time I run this. However, if I use
> hydra's processor binding options:
>
> $ mpiexec.hydra -n 2 -bind-to socket ./MPIVersion
> Triad: 26879.3853 Rate (MB/s)
>
> $ mpiexec.hydra -n 4 -bind-to socket ./MPIVersion
> Triad: 48363.8441 Rate (MB/s)
>
> $ mpiexec.hydra -n 8 -bind-to socket ./MPIVersion
> Triad: 63479.9284 Rate (MB/s)
It looks like with one core/socket, all your memory sits over one
channel. You can play tricks to avoid that or use 4 cores/socket in
order to use all memory channels.
> $ mpiexec.hydra -n 10 -bind-to socket ./MPIVersion
> Triad: 66160.5627 Rate (MB/s)
So this is a pretty low fraction (55%) of 59.7*2 = 119.4. I suspect
your memory or motherboard is at most 1600 MHz, so your peak would be
102.4 GB/s.
You can check this as root using "dmidecode --type 17", which should
give one entry per channel, looking something like this:
Handle 0x002B, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: 0x002F
Total Width: Unknown
Data Width: Unknown
Size: 4096 MB
Form Factor: DIMM
Set: None
Locator: DIMM0
Bank Locator: BANK 0
Type: <OUT OF SPEC>
Type Detail: None
Speed: Unknown
Manufacturer: Not Specified
Serial Number: Not Specified
Asset Tag: Unknown
Part Number: Not Specified
Rank: Unknown
Configured Clock Speed: 1600 MHz
> Now my question is, is 13.5 GB/s on one processor "good"?
One memory channel is 1.866 * 8 = 14.9 GB/s. You can get some bonus
overlap when adjacent pages are on different busses, but the prefetcher
only looks so far ahead, so most of the time you're only pulling from
one channel when using one thread.
> Because when I compare this to the 59.7 GB/s it seems really
> inefficient. Is there a way to browse through my system files to
> confirm this?
>
> Also, when I use multiple cores and with proper binding, the streams BW
> exceeds the reported max BW. Is this expected?
You're using two sockets.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150813/4e7cfa69/attachment.pgp>
More information about the petsc-users
mailing list