[mpich-discuss] Why is my quad core slower than cluster

Mon Jul 14 11:05:25 CDT 2008

Not sure if it's an MPICH problem or a memory bandwidth problem on
multicore.

One way to check the memory bandwidth is to run the STREAM benchmark on 1
core and multiple cores. http://www.cs.virginia.edu/stream/ref.html

Rajeev 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of zach
> Sent: Monday, July 14, 2008 10:45 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Why is my quad core slower than cluster
> 
> This is starting to sound like a limitation of using MPI on multicore
> processors and not necessarily an issue with the install or
> configuration of mpich.
> Can we expect improvements to mpich in the near future to 
> deal with this?
> Is it just that the quad multicore cpus are newer and have not been
> rigorously tested with mpich yet to find all the issues?
> -not great news for me since I just built a quad core cpu box thinking
> I would get near x4 speed -up... :(
> 
> Zach
> 
> On 7/14/08, Gaetano Bellanca <gaetano.bellanca at unife.it> wrote:
> > Hello to everybody,
> >
> > we have more or less the same problems. We are developing a 
> FDTD code for
> > electromagnetic simulation in FORTRAN. The code is mainly 
> based on a 3 loops
> > used to compute the electric field components, and 3 
> identical loops to
> > compute the magnetic field components.
> >
> > We are using a small PC cluster made with 10 PIV 3GHz 
> connected with a
> > 1Gbit/s ethernet LAN built some years ago, and a Intel 
> Vernonia 2 procesors
> > / 4 core each (total 8 core). The processors are Intel Xeon E5345  @
> > 2.33GHz.
> > We are using the Intel 10.1 fortran compiler (compiler 
> options as indicated
> > in the manual for machine optimization, with -O3), ubuntu 
> 7.10 (kernel
> > 2.6.22-14 generic on the cluster, kernel 2.6.22-14 server on the
> > multiprocessor machine).
> > mpich2 is compiled with nemesis, and we are still with the 
> 2.1.06p1 (still
> > no time to upgrade to  the last version)
> >
> > Testing the code for a (not too big, to keep the overall 
> time limited)
> > simulation (85184 variables 44x44x44 cells, 51000 temporal 
> iterations) we
> > had  a good scaling on the cluster. On the total simulation 
> time (with
> > parallel and sequential operations mixed) we have a 
> speed-up of 8.5 using
> > 10PEs ( 6.2 with 9, 8.2 with 8, 5 with 7, 5.8 with 6 etc ...).
> >
> > The same simulation has been run on the 2PEs/quad core 
> machine but we didn't
> > have good performances.
> > The speed up is 2 if we run mpiexec -n 2 .... as the domain 
> is divided
> > between the two processors which seems to work 
> independently. But, by
> > increasing the number of processors (core) used, running 
> the simulation with
> > .n 3, -n 4 etc ... we have a speed-up of 2.48 with 4 cores 
> (2 on each PE),
> > but only 2.6 with 8 PEs.
> >
> > We also tried to use -parallel or -openmp (limiting the 
> openmp directives
> > only in the loops of field computations), without obtaining 
> significant
> > changes in the performances, both running with mpiexec -n 1 
> or mpiexec -n 2
> > (trying to mix mpi and openmp).
> >
> > Our idea is that we have serious problems in managing the 
> shared resources
> > for memory access, but we have not expertise on that, and 
> we could be
> > totally wrong.
> >
> > Regards.
> >
> > Gaetano
> >
> >
> > ________________________________
> > Gaetano Bellanca - Department of Engineering - University of Ferrara
> > Via Saragat, 1 - 44100 - Ferrara - ITALY
> > Voice (VoIP):  +39 0532 974809     Fax:  +39 0532 974870
> > mailto:gaetano.bellanca at unife.it
> > ________________________________
> >
> >
> >
> 
>