<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div class="">Thanks for the reference timing. I can use this to talk to the vendor (or switch vendor…).</div>
<div class=""><br class="">
</div>
I am on a 2 socket system. It looks like the node the vendor built for me has 4 DIMMS, possibly all connected to the same socket?
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">[amduser@gigi ~]$ numactl -H</span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">available: 2 nodes (0-1)</span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
94 95</span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">node 0 size: 257877 MB</span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">node 0 free: 225820 MB</span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">node 1 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
117 118 119 120 121 122 123 124 125 126 127</span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">node 1 size: 0 MB</span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">node 1 free: 0 MB</span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">node distances:</span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class="">node 0 1 </span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class=""> 0: 10 32 </span></div>
<div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: "Fira Code";" class="">
<span style="font-variant-ligatures: no-common-ligatures;" class=""> 1: 32 10 </span></div>
<div class=""><span style="font-variant-ligatures: no-common-ligatures;" class=""><br class="">
</span></div>
<div class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">Regards,</span></div>
<div class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">Blaise</span></div>
<div class=""><br class="">
</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Apr 16, 2021, at 5:50 PM, Jed Brown <<a href="mailto:jed@jedbrown.org" class="">jed@jedbrown.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" class="">junchao.zhang@gmail.com</a>> writes:<br class="">
<br class="">
<blockquote type="cite" class="">Why do I see the max bandwidth of EPYC-7502 is 200GB/s,<br class="">
<a href="https://www.cpu-world.com/CPUs/Zen/AMD-EPYC%207502.html?" class="">https://www.cpu-world.com/CPUs/Zen/AMD-EPYC%207502.html?</a><br class="">
</blockquote>
<br class="">
That's theoretical peak per socket, but he's probably on a 2-socket system.<br class="">
<br class="">
<blockquote type="cite" class="">Your bandwidth is around 1/8 of the max. Is it because your machine only<br class="">
has one DIMM, thus only uses one memory channel?<br class="">
</blockquote>
<br class="">
Here's an lstopo for my 2x7452 system with the NPS4 BIOS setting.<br class="">
<br class="">
<span id="cid:60303526-640C-4C26-8600-7BED296656BC"><noether-nps4-lstopo.png></span><br class="">
And running the standard benchmark. Note that the highest performance is achieved with 16 ranks, which is one process per memory channel.<br class="">
<br class="">
$ make stream MPI_BINDING='--bind-to core --map-by numa'<br class="">
mpicc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3 -funsafe-math-optimizations -march=native -I/projects/petsc/include -I/projects/petsc/mpich-O/include -I/opt/rocm/include
`pwd`/MPIVersion.c<br class="">
Running streams with 'mpiexec --bind-to core --map-by numa' using 'NPMAX=64'<br class="">
1 22833.4179 Rate (MB/s)<br class="">
2 45445.0356 Rate (MB/s) 1.99029<br class="">
3 67893.5186 Rate (MB/s) 2.97343<br class="">
4 90691.7604 Rate (MB/s) 3.97189<br class="">
5 113578.2249 Rate (MB/s) 4.97421<br class="">
6 134311.8951 Rate (MB/s) 5.88226<br class="">
7 157968.6688 Rate (MB/s) 6.91832<br class="">
8 179669.5008 Rate (MB/s) 7.86871<br class="">
9 116836.9905 Rate (MB/s) 5.11693<br class="">
10 129766.7959 Rate (MB/s) 5.6832<br class="">
11 142340.1754 Rate (MB/s) 6.23386<br class="">
12 155720.9245 Rate (MB/s) 6.81987<br class="">
13 167771.0425 Rate (MB/s) 7.34762<br class="">
14 179843.1413 Rate (MB/s) 7.87632<br class="">
15 193897.7955 Rate (MB/s) 8.49185<br class="">
16 206523.2769 Rate (MB/s) 9.04479<br class="">
17 142789.6490 Rate (MB/s) 6.25354<br class="">
18 151205.7782 Rate (MB/s) 6.62213<br class="">
19 158845.6213 Rate (MB/s) 6.95672<br class="">
20 167435.0701 Rate (MB/s) 7.3329<br class="">
21 175731.9719 Rate (MB/s) 7.69627<br class="">
22 183984.5220 Rate (MB/s) 8.05769<br class="">
23 192058.1005 Rate (MB/s) 8.41128<br class="">
24 200761.0267 Rate (MB/s) 8.79243<br class="">
25 155965.3011 Rate (MB/s) 6.83058<br class="">
26 161673.4841 Rate (MB/s) 7.08057<br class="">
27 167871.3408 Rate (MB/s) 7.35201<br class="">
28 173951.9592 Rate (MB/s) 7.61831<br class="">
29 179456.7696 Rate (MB/s) 7.8594<br class="">
30 186474.7412 Rate (MB/s) 8.16675<br class="">
31 191749.9724 Rate (MB/s) 8.39778<br class="">
32 198041.2958 Rate (MB/s) 8.67332<br class="">
33 164697.4378 Rate (MB/s) 7.21301<br class="">
34 168645.1579 Rate (MB/s) 7.3859<br class="">
35 173776.6503 Rate (MB/s) 7.61063<br class="">
36 179109.2764 Rate (MB/s) 7.84418<br class="">
37 183488.6248 Rate (MB/s) 8.03597<br class="">
38 188954.7149 Rate (MB/s) 8.27536<br class="">
39 193140.8746 Rate (MB/s) 8.4587<br class="">
40 198804.6800 Rate (MB/s) 8.70675<br class="">
41 169620.7845 Rate (MB/s) 7.42863<br class="">
42 173306.4149 Rate (MB/s) 7.59004<br class="">
43 177089.0440 Rate (MB/s) 7.7557<br class="">
44 181301.7744 Rate (MB/s) 7.9402<br class="">
45 184888.0697 Rate (MB/s) 8.09726<br class="">
46 189267.4148 Rate (MB/s) 8.28906<br class="">
47 193386.6666 Rate (MB/s) 8.46946<br class="">
48 197338.0962 Rate (MB/s) 8.64252<br class="">
49 171754.3212 Rate (MB/s) 7.52207<br class="">
50 174416.7410 Rate (MB/s) 7.63867<br class="">
51 177590.9451 Rate (MB/s) 7.77768<br class="">
52 181843.2566 Rate (MB/s) 7.96391<br class="">
53 184956.9725 Rate (MB/s) 8.10028<br class="">
54 188459.9627 Rate (MB/s) 8.2537<br class="">
55 191294.8101 Rate (MB/s) 8.37785<br class="">
56 195167.9061 Rate (MB/s) 8.54747<br class="">
57 173077.5973 Rate (MB/s) 7.58002<br class="">
58 175707.2648 Rate (MB/s) 7.69519<br class="">
59 178524.3544 Rate (MB/s) 7.81856<br class="">
60 181446.2196 Rate (MB/s) 7.94653<br class="">
61 184176.0972 Rate (MB/s) 8.06608<br class="">
62 187132.5388 Rate (MB/s) 8.19556<br class="">
63 190094.3249 Rate (MB/s) 8.32527<br class="">
64 192888.0887 Rate (MB/s) 8.44763<br class="">
<br class="">
<br class="">
Here's a hacked version to use nontemporal stores, just to show that the 300 GB/s you see in some publications is "real".<br class="">
<br class="">
$ make stream MPI_BINDING='--bind-to core --map-by numa' NPMAX=16<br class="">
mpicc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3 -funsafe-math-optimizations -march=native -I/projects/petsc/include -I/projects/petsc/mpich-O/include -I/opt/rocm/include
`pwd`/MPIVersion.c<br class="">
Running streams with 'mpiexec --bind-to core --map-by numa' using 'NPMAX=16'<br class="">
Copy 33486.2539 Scale 34071.3326 Add 32054.4926 Triad 31648.2821<br class="">
Copy 66152.1368 Scale 67692.2040 Add 63900.9276 Triad 63483.8342<br class="">
Copy 99017.3661 Scale 100531.6449 Add 95109.5224 Triad 94124.0088<br class="">
Copy 132296.5106 Scale 132442.5912 Add 127105.7793 Triad 125468.3513<br class="">
Copy 164162.0199 Scale 167233.3935 Add 158407.6986 Triad 156593.6716<br class="">
Copy 196532.3832 Scale 198430.1791 Add 189974.2218 Triad 188255.0783<br class="">
Copy 229330.8877 Scale 227943.5409 Add 220342.2619 Triad 215985.3785<br class="">
Copy 262284.7885 Scale 263016.6022 Add 251791.8289 Triad 248723.9313<br class="">
Copy 171321.9641 Scale 172641.8351 Add 169675.2284 Triad 168820.0418<br class="">
Copy 180864.2214 Scale 182373.5147 Add 187165.1475 Triad 187378.3674<br class="">
Copy 199656.4142 Scale 199780.0876 Add 204114.6705 Triad 204703.4734<br class="">
Copy 218084.4093 Scale 219094.7833 Add 223458.4367 Triad 222833.6622<br class="">
Copy 236477.1224 Scale 236699.9477 Add 240783.0674 Triad 240799.8947<br class="">
Copy 253032.9327 Scale 254071.3971 Add 260259.9976 Triad 260421.9921<br class="">
Copy 272682.9868 Scale 272881.0838 Add 279639.9825 Triad 278932.2833<br class="">
Copy 290096.2550 Scale 287402.4978 Add 297025.8896 Triad 295550.2586<br class="">
<blockquote type="cite" class=""><br class="">
--Junchao Zhang<br class="">
<br class="">
<br class="">
On Fri, Apr 16, 2021 at 3:27 PM Jed Brown <<a href="mailto:jed@jedbrown.org" class="">jed@jedbrown.org</a>> wrote:<br class="">
<br class="">
<blockquote type="cite" class="">Blaise A Bourdin <<a href="mailto:bourdin@lsu.edu" class="">bourdin@lsu.edu</a>> writes:<br class="">
<br class="">
<blockquote type="cite" class="">Hi,<br class="">
<br class="">
I am test-driving hardware for a new machine for my group and having a<br class="">
</blockquote>
hard time making sense the output of the stream test:<br class="">
<blockquote type="cite" class=""><br class="">
I am attaching the results and my reference (xeon 8260 nodes on QueenBee<br class="">
</blockquote>
3 at LONI).<br class="">
<blockquote type="cite" class=""><br class="">
If I understand correctly, on the AMD node, the memory bandwidth is<br class="">
</blockquote>
saturated with a single core. Is this expected?<br class="">
<blockquote type="cite" class="">The comparison is not totally fair in that QB3 uses intel MPI and MPI<br class="">
</blockquote>
compilers, whereas the AMD node uses mvapich2, which I compiled with the<br class="">
following options: ./configure<br class="">
--prefix=/home/amduser/Development/mvapich2-2.3.5-gcc9.3<br class="">
--with-device=ch3:nemesis:tcp --with-rdma=gen2 --enable-cxx --enable-romio<br class="">
--enable-fast=all --enable-g=dbg --enable-shared-libs=gcc --enable-shared<br class="">
<blockquote type="cite" class=""><br class="">
Am I doing something wrong on the AMD node?<br class="">
</blockquote>
<br class="">
It looks like it's oversubscribing some cores rather than spreading them<br class="">
over the node. You should get around 200 GB/s on this node without using<br class="">
streaming instructions (closer to 300 GB/s with those, but it isn't<br class="">
representative of real-world code). Slightly less if you don't have NPS4<br class="">
activated.<br class="">
<br class="">
You can check your MPI docs and use make MPI_BINDING='--bind-to core', for<br class="">
example.<br class="">
<br class="">
</blockquote>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br class="">
<div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<span class="Apple-style-span" style="border-collapse: separate; font-variant-ligatures: normal; font-variant-east-asian: normal; font-variant-position: normal; line-height: normal; border-spacing: 0px; -webkit-text-decorations-in-effect: none;">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-position: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-position: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<span class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-position: normal; font-weight: normal; letter-spacing: normal; line-height: normal; -webkit-text-decorations-in-effect: none; text-indent: 0px; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px;">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div style="margin: 0px;" class="">-- </div>
<div style="margin: 0px;" class="">A.K. & Shirley Barton Professor of Mathematics</div>
<div style="margin: 0px;" class="">Adjunct Professor of Mechanical Engineering</div>
<div style="margin: 0px;" class="">Adjunct of the Center for Computation & Technology</div>
<div style="margin: 0px;" class="">Louisiana State University, <span style="-webkit-text-decorations-in-effect: none;" class="">Lockett Hall Room 344, </span><span style="-webkit-text-decorations-in-effect: none;" class="">Baton Rouge, LA 70803, USA</span></div>
<div style="margin: 0px;" class=""><span style="-webkit-text-decorations-in-effect: none;" class="">Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 Web </span><span style="-webkit-text-decorations-in-effect: none;" class=""><a href="http://www.math.lsu.edu/~bourdin" class="">http://www.math.lsu.edu/~bourdin</a></span></div>
</div>
</span></div>
</span></div>
</span></div>
</span></div>
</div>
</div>
</div>
<br class="">
</div>
</body>
</html>