[mpich-discuss] processor/memory affinity on quad core systems
Franco Catalano
franco.catalano at uniroma1.it
Thu Jul 24 04:38:55 CDT 2008
Hi,
Thanks to Chong Tan for the suggestion with numactl. The cluster of my
laboratory is being used primarly by me, so I am not facing with job
queue issues and this is a fairly good solution for my computations.
I have a question about the use of numactl. Assuming a 4 processor quad
core machine, is it the same doing this:
mpiexec -np 4 numactl --cpunodebind=0
--membind=0 ./parallel_executable : -np 4 numactl --cpunodebind=1
--membind=1 ./parallel_executable : <same for the remaining two nodes>
instead of:
mpiexec numactl --physcpubind=0 --membind=0 ./parallel_executable :
numactl --physcpubind=1 --membind=0 ./parallel_executable : numactl
--physcpubind=2 --membind=0 ./parallel_executable : numactl
--physcpubind=3 --membind=0 ./parallel_executable : numactl
--physcpubind=4 --membind=1 ./parallel_executable : numactl
--physcpubind=5 --membind=1 ./parallel_executable : <same for the
remaining ten cores>
In other words, since the memory in the numa architecture is assigned to
nodes, what is the difference of binding 4 mpi jobs to each quad core
processors instead of binding each core to a single mpi job?
Thanks.
Franco
Il giorno mar, 22/07/2008 alle 10.11 -0700, chong tan ha scritto:
> no easy way with mpiexec, especially if you do mpiexec -n. But this
> should work
>
>
>
>
>
> mpiexec numactl --physcpubind N0 <1 of your proc> :
>
> numactl -- physcpubind N1 <2nd of oof proc> :
>
> .<same for the rest>
>
>
>
> add --membind if you want (and you definately want it for Opteron).
>
>
>
> tan
>
>
>
> --- On Tue, 7/22/08, Franco Catalano <franco.catalano at uniroma1.it>
> wrote:
>
>
> From: Franco Catalano <franco.catalano at uniroma1.it>
> Subject: [mpich-discuss] processor/memory affinity on quad
> core systems
> To: mpich-discuss at mcs.anl.gov
> Date: Tuesday, July 22, 2008, 2:28 AM
>
> Hi,
> Is it possible to ensure processor/memory affinity on mpi jobs launched
> with mpiexec (or mpirun)?
> I am using mpich2 1.0.7 with WRF on a 4 processor Opteron quad core (16
> cores total) machine and I have observed a sensible (more than 20%)
> variability of the time needed to compute a single time step. Taking a
> look to the output of top, I have noticed that the system moves
> processes over the 16 cores regardless of processor/memory affinity. So,
> when processes are running on cores away from their memory, the time
> needed for the time advancement is longer.
> I know that, for example, OpenMPI provides a command line option for
> mpiexec (or mpirun) to ensure the affinity binding:
> --mca param mpi_paffinity_alone = 1
> I have tried this with WRF and it works.
> Is there a way to do this with mpich2?
> Otherwise, I think that it would be very useful to include such
> cabability into the next release.
> Thank you for any suggestion.
>
> Franco
>
> --
> ____________________________________________________
> Eng. Franco Catalano
> Ph.D. Student
>
> D.I.T.S.
> Department of Hydraulics, Transportation and Roads.
> Via Eudossiana 18, 00184 Rome
> University of Rome "La Sapienza".
> tel: +390644585218
More information about the mpich-discuss
mailing list