<table cellspacing='0' cellpadding='0' border='0' ><tr><td valign='top' style='font: inherit;'><P><BR>It could be the same. The OS actually could make them not the same. The membind will behave the same. Just that the OS has the freedom to move the processes among the cores in a physical CPU, resulting in context switching. </P>
<P> </P>
<P>Please spend sometime to do a small experiment on this. You will learn the 'behavior' of your HW and OS better. Try it on a few different HW, and you will be amused.</P>
<P> </P>
<P>tan</P>
<P><BR>--- On <B>Thu, 7/24/08, Franco Catalano <I><franco.catalano@uniroma1.it></I></B> wrote:<BR></P>
<BLOCKQUOTE style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: rgb(16,16,255) 2px solid">From: Franco Catalano <franco.catalano@uniroma1.it><BR>Subject: Re: [mpich-discuss] processor/memory affinity on quad core systems<BR>To: mpich-discuss@mcs.anl.gov<BR>Date: Thursday, July 24, 2008, 2:38 AM<BR><BR><PRE>Hi,
Thanks to Chong Tan for the suggestion with numactl. The cluster of my
laboratory is being used primarly by me, so I am not facing with job
queue issues and this is a fairly good solution for my computations.
I have a question about the use of numactl. Assuming a 4 processor quad
core machine, is it the same doing this:
mpiexec -np 4 numactl --cpunodebind=0
--membind=0 ./parallel_executable : -np 4 numactl --cpunodebind=1
--membind=1 ./parallel_executable : <same for the remaining two nodes>
instead of:
mpiexec numactl --physcpubind=0 --membind=0 ./parallel_executable :
numactl --physcpubind=1 --membind=0 ./parallel_executable : numactl
--physcpubind=2 --membind=0 ./parallel_executable : numactl
--physcpubind=3 --membind=0 ./parallel_executable : numactl
--physcpubind=4 --membind=1 ./parallel_executable : numactl
--physcpubind=5 --membind=1 ./parallel_executable : <same for the
remaining ten cores>
In other words, since the memory in the numa architecture is assigned to
nodes, what is the difference of binding 4 mpi jobs to each quad core
processors instead of binding each core to a single mpi job?
Thanks.
Franco
Il giorno mar, 22/07/2008 alle 10.11 -0700, chong tan ha scritto:
> no easy way with mpiexec, especially if you do mpiexec -n. But this
> should work
>
>
>
>
>
> mpiexec numactl --physcpubind N0 <1 of your proc> :
>
> numactl -- physcpubind N1 <2nd of oof proc> :
>
> .<same for the rest>
>
>
>
> add --membind if you want (and you definately want it for Opteron).
>
>
>
> tan
>
>
>
> --- On Tue, 7/22/08, Franco Catalano <franco.catalano@uniroma1.it>
> wrote:
>
>
> From: Franco Catalano <franco.catalano@uniroma1.it>
> Subject: [mpich-discuss] processor/memory affinity on quad
> core systems
> To: mpich-discuss@mcs.anl.gov
> Date: Tuesday, July 22, 2008, 2:28 AM
>
> Hi,
> Is it possible to ensure processor/memory affinity on mpi jobs
launched
> with mpiexec (or mpirun)?
> I am using mpich2 1.0.7 with WRF on a 4 processor Opteron quad
core (16
> cores total) machine and I have observed a sensible (more than
20%)
> variability of the time needed to compute a single time step.
Taking a
> look to the output of top, I have noticed that the system moves
> processes over the 16 cores regardless of processor/memory
affinity. So,
> when processes are running on cores away from their memory, the
time
> needed for the time advancement is longer.
> I know that, for example, OpenMPI provides a command line option
for
> mpiexec (or mpirun) to ensure the affinity binding:
> --mca param mpi_paffinity_alone = 1
> I have tried this with WRF and it works.
> Is there a way to do this with mpich2?
> Otherwise, I think that it would be very useful to include such
> cabability into the next release.
> Thank you for any suggestion.
>
> Franco
>
> --
> ____________________________________________________
> Eng. Franco Catalano
> Ph.D. Student
>
> D.I.T.S.
> Department of Hydraulics, Transportation and Roads.
> Via Eudossiana 18, 00184 Rome
> University of Rome "La Sapienza".
> tel: +390644585218</PRE></BLOCKQUOTE></td></tr></table><br>