[mpich-discuss] Why is my quad core slower than cluster

Mon Jul 14 10:45:16 CDT 2008

This is starting to sound like a limitation of using MPI on multicore
processors and not necessarily an issue with the install or
configuration of mpich.
Can we expect improvements to mpich in the near future to deal with this?
Is it just that the quad multicore cpus are newer and have not been
rigorously tested with mpich yet to find all the issues?
-not great news for me since I just built a quad core cpu box thinking
I would get near x4 speed -up... :(

Zach

On 7/14/08, Gaetano Bellanca <gaetano.bellanca at unife.it> wrote:
> Hello to everybody,
>
> we have more or less the same problems. We are developing a FDTD code for
> electromagnetic simulation in FORTRAN. The code is mainly based on a 3 loops
> used to compute the electric field components, and 3 identical loops to
> compute the magnetic field components.
>
> We are using a small PC cluster made with 10 PIV 3GHz connected with a
> 1Gbit/s ethernet LAN built some years ago, and a Intel Vernonia 2 procesors
> / 4 core each (total 8 core). The processors are Intel Xeon E5345  @
> 2.33GHz.
> We are using the Intel 10.1 fortran compiler (compiler options as indicated
> in the manual for machine optimization, with -O3), ubuntu 7.10 (kernel
> 2.6.22-14 generic on the cluster, kernel 2.6.22-14 server on the
> multiprocessor machine).
> mpich2 is compiled with nemesis, and we are still with the 2.1.06p1 (still
> no time to upgrade to  the last version)
>
> Testing the code for a (not too big, to keep the overall time limited)
> simulation (85184 variables 44x44x44 cells, 51000 temporal iterations) we
> had  a good scaling on the cluster. On the total simulation time (with
> parallel and sequential operations mixed) we have a speed-up of 8.5 using
> 10PEs ( 6.2 with 9, 8.2 with 8, 5 with 7, 5.8 with 6 etc ...).
>
> The same simulation has been run on the 2PEs/quad core machine but we didn't
> have good performances.
> The speed up is 2 if we run mpiexec -n 2 .... as the domain is divided
> between the two processors which seems to work independently. But, by
> increasing the number of processors (core) used, running the simulation with
> .n 3, -n 4 etc ... we have a speed-up of 2.48 with 4 cores (2 on each PE),
> but only 2.6 with 8 PEs.
>
> We also tried to use -parallel or -openmp (limiting the openmp directives
> only in the loops of field computations), without obtaining significant
> changes in the performances, both running with mpiexec -n 1 or mpiexec -n 2
> (trying to mix mpi and openmp).
>
> Our idea is that we have serious problems in managing the shared resources
> for memory access, but we have not expertise on that, and we could be
> totally wrong.
>
> Regards.
>
> Gaetano
>
>
> ________________________________
> Gaetano Bellanca - Department of Engineering - University of Ferrara
> Via Saragat, 1 - 44100 - Ferrara - ITALY
> Voice (VoIP):  +39 0532 974809     Fax:  +39 0532 974870
> mailto:gaetano.bellanca at unife.it
> ________________________________
>
>
>