[petsc-users] Understanding the memory bandwidth

Thu Aug 13 15:22:42 CDT 2015

On Thu, Aug 13, 2015 at 1:04 PM, Jed Brown <jed at jedbrown.org> wrote:
> It looks like with one core/socket, all your memory sits over one
> channel.  You can play tricks to avoid that or use 4 cores/socket in
> order to use all memory channels.

How do I play these tricks?

> So this is a pretty low fraction (55%) of 59.7*2 = 119.4.  I suspect
> your memory or motherboard is at most 1600 MHz, so your peak would be
> 102.4 GB/s.

> You can check this as root using "dmidecode --type 17", which should
> give one entry per channel, looking something like this:
>
> Handle 0x002B, DMI type 17, 34 bytes
> Memory Device
>         Array Handle: 0x002A
>         Error Information Handle: 0x002F
>         Total Width: Unknown
>         Data Width: Unknown
>         Size: 4096 MB
>         Form Factor: DIMM
>         Set: None
>         Locator: DIMM0
>         Bank Locator: BANK 0
>         Type: <OUT OF SPEC>
>         Type Detail: None
>         Speed: Unknown
>         Manufacturer: Not Specified
>         Serial Number: Not Specified
>         Asset Tag: Unknown
>         Part Number: Not Specified
>         Rank: Unknown
>         Configured Clock Speed: 1600 MHz

I have no root access. Is there another way to confirm the clock speed?

---

So if I have two sockets per node, then the theoretical peak bandwidth
is actually double than what I thought (whether it be 119.4 GB/s or
102.4 GB/s). And if 8 cores really is the optimal number to use for a
single compute node, why are there 20 totals to begin with? Or would
this depend on the particular application?

Also, can someone elaborate on the difference between the words
"core", "processor", and "thread"?