[mpich-discuss] Hydra process affinity
Guillaume Mercier
mercierg at mcs.anl.gov
Fri May 25 03:20:10 CDT 2012
Hello,
Pavan will probably confirm this point, but my guess is that the meaning
of "binding to sockets"
differs from one implementation to the other.
If you take a look at the Hydra documentation, you will see that the
behaviour you're experiencing
seems to be correct. But I agree that it's not the one you want, though.
Regards
Guillaume
On 05/25/2012 01:14 AM, Martin Cuma wrote:
> Hello,
>
> I am trying to get consistent performance on dual socket multi-core
> nodes which requires process binding to the socket. The core generally
> runs one process per socket and launches multiple OpenMP threads to
> fill up the socket's cores.
>
> I find a problem with the -binding cpu:sockets in the Hydra's mpirun
> implementation - it binds the process to the first core on the socket,
> rather than having it access all the socket's cores.
>
> For example, on a dual socket, 6 core CPU (12 cores total) node, I get:
> /uufs/chpc.utah.edu/sys/pkg/mpich2/1.5b1/bin/mpirun -binding
> cpu:sockets -np 2
> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-bind --get
> 0x00000001
> 0x00000002
> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p
> --hierarchical socket.core 0x00000001
> Socket:0.Core:0
> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p
> --hierarchical socket.core 0x00000002
> Socket:1.Core:0
>
> I am using hwloc to report the binding and then calculating the
> "physical" location. Notice that only single core is reported for each
> socket, rather than reporting all 6 cores. This is verified by running
> with 6 OpenMP threads per process and getting only 100% CPU load,
> rather than 600%.
>
> Now, OpenMPI 1.6.1a1 (just added in today) is doing the affinity right
> as:
> /uufs/ember.arches/sys/pkg/openmpi/1.6.1a1i/bin/mpirun --bysocket
> --bind-to-socket -np 2
> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-bind --get
> 0x00aaaaaa
> 0x00555555
> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p
> --hierarchical socket.core 0x00aaaaaa
> Socket:1.Core:0 Socket:1.Core:1 Socket:1.Core:2 Socket:1.Core:8
> Socket:1.Core:9 Socket:1.Core:10
> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p
> --hierarchical socket.core 0x00555555
> Socket:0.Core:0 Socket:0.Core:1 Socket:0.Core:2 Socket:0.Core:8
> Socket:0.Core:9 Socket:0.Core:10
>
> I have a suspicion that other binding options may be broken as well,
> for example, I was trying to run -binding cache:l3, which should have
> the same efect as cpu:sockets on this machine, and it only ran on one
> core as well (100% load).
>
> I would appreciate if someone could comment on this and if this is a
> bug, I'd be happy to work with the developers to get this fixed. This
> shows up both in 1.4.1p1 and 1.5b1.
>
> Thanks,
> MC
>
>
More information about the mpich-discuss
mailing list