[petsc-users] Understanding streams test on AMD EPYC 7502

Junchao Zhang junchao.zhang at gmail.com
Fri Apr 16 17:38:34 CDT 2021

Why do I see the max bandwidth of EPYC-7502 is 200GB/s,

Your bandwidth is around 1/8 of the max. Is it because your machine only
has one DIMM, thus only uses one memory channel?

--Junchao Zhang

On Fri, Apr 16, 2021 at 3:27 PM Jed Brown <jed at jedbrown.org> wrote:

> Blaise A Bourdin <bourdin at lsu.edu> writes:
> > Hi,
> >
> > I am test-driving hardware for a new machine for my group and having a
> hard time making sense the output of the stream test:
> >
> > I am attaching the results and my reference (xeon 8260 nodes on QueenBee
> 3 at LONI).
> >
> > If I understand correctly, on the AMD node, the memory bandwidth is
> saturated with a single core. Is this expected?
> > The comparison is not totally fair in that QB3 uses intel MPI and MPI
> compilers, whereas the AMD node uses mvapich2, which I compiled with the
> following options: ./configure
> --prefix=/home/amduser/Development/mvapich2-2.3.5-gcc9.3
> --with-device=ch3:nemesis:tcp --with-rdma=gen2 --enable-cxx --enable-romio
> --enable-fast=all --enable-g=dbg --enable-shared-libs=gcc --enable-shared
> >
> > Am I doing something wrong on the AMD node?
> It looks like it's oversubscribing some cores rather than spreading them
> over the node. You should get around 200 GB/s on this node without using
> streaming instructions (closer to 300 GB/s with those, but it isn't
> representative of real-world code). Slightly less if you don't have NPS4
> activated.
> You can check your MPI docs and use make MPI_BINDING='--bind-to core', for
> example.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210416/4b3c7de4/attachment.html>

More information about the petsc-users mailing list