[petsc-users] Understanding streams test on AMD EPYC 7502

Fri Apr 16 15:25:58 CDT 2021

Blaise A Bourdin <bourdin at lsu.edu> writes:

> Hi,
>
> I am test-driving hardware for a new machine for my group and having a hard time making sense the output of the stream test:
>
> I am attaching the results and my reference (xeon 8260 nodes on QueenBee 3 at LONI).
>
> If I understand correctly, on the AMD node, the memory bandwidth is saturated with a single core. Is this expected?
> The comparison is not totally fair in that QB3 uses intel MPI and MPI compilers, whereas the AMD node uses mvapich2, which I compiled with the following options: ./configure --prefix=/home/amduser/Development/mvapich2-2.3.5-gcc9.3 --with-device=ch3:nemesis:tcp --with-rdma=gen2 --enable-cxx --enable-romio --enable-fast=all --enable-g=dbg --enable-shared-libs=gcc --enable-shared
>
> Am I doing something wrong on the AMD node?

It looks like it's oversubscribing some cores rather than spreading them over the node. You should get around 200 GB/s on this node without using streaming instructions (closer to 300 GB/s with those, but it isn't representative of real-world code). Slightly less if you don't have NPS4 activated.

You can check your MPI docs and use make MPI_BINDING='--bind-to core', for example.