[petsc-users] Very poor speed up performance
Yongjun Chen
yjxd.chen at gmail.com
Wed Dec 22 11:12:43 CST 2010
On Wed, Dec 22, 2010 at 5:54 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> On Wed, 22 Dec 2010, Yongjun Chen wrote:
>
> > On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> > > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz
> > > >
> > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so
> the
> > > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.
> > >
> > > Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not
> > > enough for iterative solvers, in fact this is absolutely terrible for
> > > iterative solvers. You really want 5.4 GB/s PER core! This machine is
> > > absolutely inappropriate for iterative solvers. No package can give you
> good
> > > speedups on this machine.
> >
> > Barry, there are 16 memories, every 2 memories make up one dual channel,
> > thus in this machine there are 8 dual channel, each dual channel has the
> > memory bandwidth 5.4GB/s.
>
> What hardware is this? [processor/chipset?]
>
By dmidecode, it shows the processor is
Handle 0x0010, DMI type 4, 40 bytes
Processor Information
Socket Designation: CPU 4
Type: Central Processor
Family: Quad-Core Opteron
Manufacturer: AMD
ID: 06 05 F6 40 74 03 E8 3D
Signature: Family 5, Model 0, Stepping 6
Flags:
DE (Debugging extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
CLFSH (CLFLUSH instruction supported)
DS (Debug store)
ACPI (ACPI supported)
MMX (MMX technology supported)
FXSR (Fast floating-point save and restore)
SSE2 (Streaming SIMD extensions 2)
SS (Self-snoop)
HTT (Hyper-threading technology)
TM (Thermal monitor supported)
Version: Quad-Core AMD Opteron(tm) Processor 8360 SE
Voltage: 1.5 V
External Clock: 200 MHz
Max Speed: 4600 MHz
Current Speed: 2500 MHz
Status: Populated, Enabled
Upgrade: Other
L1 Cache Handle: 0x0011
L2 Cache Handle: 0x0012
L3 Cache Handle: 0x0013
Serial Number: N/A
Asset Tag: N/A
Part Number: N/A
Core Count: 4
Core Enabled: 4
Characteristics:
64-bit capable
> >From what you say - it looks like each chip has 4cores, and 2
> dual-channel memory controllers for each of them.
>
> The question is - does the hardware provide scalable memory-bandwidth
> per core? Most machines don't.
>
This point is not clear for me right now.
> I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core
> run.
>
> So if the algorithm is able to use 5.4GB/s [or more] for 1 threads,
> 10.8 [or more] for 2 threads - you would just see scalable performance
> from 1 to 2, and 3, 4 would perhaps be slightly incremental to the
> 2-core performance.
>
> Satish
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/bdd4c8cb/attachment.htm>
More information about the petsc-users
mailing list