[petsc-users] Understanding the memory bandwidth

Barry Smith bsmith at mcs.anl.gov
Thu Aug 13 15:47:30 CDT 2015


> On Aug 13, 2015, at 3:30 PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Thu, Aug 13, 2015 at 3:22 PM, Justin Chang <jychang48 at gmail.com> wrote:
> On Thu, Aug 13, 2015 at 1:04 PM, Jed Brown <jed at jedbrown.org> wrote:
> > It looks like with one core/socket, all your memory sits over one
> > channel.  You can play tricks to avoid that or use 4 cores/socket in
> > order to use all memory channels.
> 
> How do I play these tricks?
> 
> > So this is a pretty low fraction (55%) of 59.7*2 = 119.4.  I suspect
> > your memory or motherboard is at most 1600 MHz, so your peak would be
> > 102.4 GB/s.
> 
> > You can check this as root using "dmidecode --type 17", which should
> > give one entry per channel, looking something like this:
> >
> > Handle 0x002B, DMI type 17, 34 bytes
> > Memory Device
> >         Array Handle: 0x002A
> >         Error Information Handle: 0x002F
> >         Total Width: Unknown
> >         Data Width: Unknown
> >         Size: 4096 MB
> >         Form Factor: DIMM
> >         Set: None
> >         Locator: DIMM0
> >         Bank Locator: BANK 0
> >         Type: <OUT OF SPEC>
> >         Type Detail: None
> >         Speed: Unknown
> >         Manufacturer: Not Specified
> >         Serial Number: Not Specified
> >         Asset Tag: Unknown
> >         Part Number: Not Specified
> >         Rank: Unknown
> >         Configured Clock Speed: 1600 MHz
> 
> I have no root access. Is there another way to confirm the clock speed?
> 
> ---
> 
> So if I have two sockets per node, then the theoretical peak bandwidth
> is actually double than what I thought (whether it be 119.4 GB/s or
> 102.4 GB/s). And if 8 cores really is the optimal number to use for a
> single compute node, why are there 20 totals to begin with? Or would
> this depend on the particular application?
> 
> Kind Answer: Different application have different needs
> 
> Cynical Answer: Computer companies sell you what they can produce,
> lots of cores, not what you need, lots of bandwidth. Bandwidth is very
> expensive and there are technical limits.

    Cost of production of a system may not, is not, simply linearly proportional to the number of cores, or number of floating point units or any other particular feature of a system. For example, maybe a 50 core system costs $50,000 and a 100 core system (everything else being equal) costs $70,000 for a company to make, in a sense each additional core (within reason) costs less so it is acceptable to get less performance out it since the incremental cost is lower.

  Barry
 


>  
> Also, can someone elaborate on the difference between the words
> "core", "processor", and "thread"?
> 
> A core and a processor are hardware terms. I think they are both fuzzy,
> but I understand a core to be something that can carry a thread of execution,
> namely a program counter, instruction and data stream, and compute something.
> A thread is a logical construct for talking about an execution stream.
> 
>    Matt
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener



More information about the petsc-users mailing list