[petsc-users] Very poor speed up performance

Satish Balay balay at mcs.anl.gov
Wed Dec 22 10:54:53 CST 2010


On Wed, 22 Dec 2010, Yongjun Chen wrote:

> On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

> > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz
> > >
> > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the
> > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.
> >
> >    Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not
> > enough for iterative solvers, in fact this is absolutely terrible for
> > iterative solvers. You really want 5.4 GB/s PER core! This machine is
> > absolutely inappropriate for iterative solvers. No package can give you good
> > speedups on this machine.
> 
> Barry, there are 16 memories, every 2 memories make up one dual channel,
> thus in this machine there are 8 dual channel, each dual channel has the
> memory bandwidth 5.4GB/s.

What hardware is this? [processor/chipset?]

>From what you say - it looks like each chip has 4cores, and 2
dual-channel memory controllers for each of them.

The question is - does the hardware provide scalable memory-bandwidth
per core?  Most machines don't.

I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core run.

So if the algorithm is able to use 5.4GB/s [or more] for 1 threads,
10.8 [or more] for 2 threads - you would just see scalable performance
from 1 to 2, and 3, 4 would perhaps be slightly incremental to the
2-core performance.

Satish


More information about the petsc-users mailing list