[mpich-discuss] Why is my quad core slower than cluster
Rajeev Thakur
thakur at mcs.anl.gov
Mon Jul 14 11:05:25 CDT 2008
Not sure if it's an MPICH problem or a memory bandwidth problem on
multicore.
One way to check the memory bandwidth is to run the STREAM benchmark on 1
core and multiple cores. http://www.cs.virginia.edu/stream/ref.html
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of zach
> Sent: Monday, July 14, 2008 10:45 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Why is my quad core slower than cluster
>
> This is starting to sound like a limitation of using MPI on multicore
> processors and not necessarily an issue with the install or
> configuration of mpich.
> Can we expect improvements to mpich in the near future to
> deal with this?
> Is it just that the quad multicore cpus are newer and have not been
> rigorously tested with mpich yet to find all the issues?
> -not great news for me since I just built a quad core cpu box thinking
> I would get near x4 speed -up... :(
>
> Zach
>
> On 7/14/08, Gaetano Bellanca <gaetano.bellanca at unife.it> wrote:
> > Hello to everybody,
> >
> > we have more or less the same problems. We are developing a
> FDTD code for
> > electromagnetic simulation in FORTRAN. The code is mainly
> based on a 3 loops
> > used to compute the electric field components, and 3
> identical loops to
> > compute the magnetic field components.
> >
> > We are using a small PC cluster made with 10 PIV 3GHz
> connected with a
> > 1Gbit/s ethernet LAN built some years ago, and a Intel
> Vernonia 2 procesors
> > / 4 core each (total 8 core). The processors are Intel Xeon E5345 @
> > 2.33GHz.
> > We are using the Intel 10.1 fortran compiler (compiler
> options as indicated
> > in the manual for machine optimization, with -O3), ubuntu
> 7.10 (kernel
> > 2.6.22-14 generic on the cluster, kernel 2.6.22-14 server on the
> > multiprocessor machine).
> > mpich2 is compiled with nemesis, and we are still with the
> 2.1.06p1 (still
> > no time to upgrade to the last version)
> >
> > Testing the code for a (not too big, to keep the overall
> time limited)
> > simulation (85184 variables 44x44x44 cells, 51000 temporal
> iterations) we
> > had a good scaling on the cluster. On the total simulation
> time (with
> > parallel and sequential operations mixed) we have a
> speed-up of 8.5 using
> > 10PEs ( 6.2 with 9, 8.2 with 8, 5 with 7, 5.8 with 6 etc ...).
> >
> > The same simulation has been run on the 2PEs/quad core
> machine but we didn't
> > have good performances.
> > The speed up is 2 if we run mpiexec -n 2 .... as the domain
> is divided
> > between the two processors which seems to work
> independently. But, by
> > increasing the number of processors (core) used, running
> the simulation with
> > .n 3, -n 4 etc ... we have a speed-up of 2.48 with 4 cores
> (2 on each PE),
> > but only 2.6 with 8 PEs.
> >
> > We also tried to use -parallel or -openmp (limiting the
> openmp directives
> > only in the loops of field computations), without obtaining
> significant
> > changes in the performances, both running with mpiexec -n 1
> or mpiexec -n 2
> > (trying to mix mpi and openmp).
> >
> > Our idea is that we have serious problems in managing the
> shared resources
> > for memory access, but we have not expertise on that, and
> we could be
> > totally wrong.
> >
> > Regards.
> >
> > Gaetano
> >
> >
> > ________________________________
> > Gaetano Bellanca - Department of Engineering - University of Ferrara
> > Via Saragat, 1 - 44100 - Ferrara - ITALY
> > Voice (VoIP): +39 0532 974809 Fax: +39 0532 974870
> > mailto:gaetano.bellanca at unife.it
> > ________________________________
> >
> >
> >
>
>
More information about the mpich-discuss
mailing list