[mpich-discuss] Hydra process affinity
Martin Cuma
martin.cuma at utah.edu
Thu May 24 18:14:44 CDT 2012
Hello,
I am trying to get consistent performance on dual socket multi-core nodes
which requires process binding to the socket. The core generally runs one
process per socket and launches multiple OpenMP threads to fill up the
socket's cores.
I find a problem with the -binding cpu:sockets in the Hydra's mpirun
implementation - it binds the process to the first core on the socket,
rather than having it access all the socket's cores.
For example, on a dual socket, 6 core CPU (12 cores total) node, I get:
/uufs/chpc.utah.edu/sys/pkg/mpich2/1.5b1/bin/mpirun -binding cpu:sockets
-np 2 /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-bind --get
0x00000001
0x00000002
/uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p --hierarchical socket.core 0x00000001
Socket:0.Core:0
/uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p --hierarchical socket.core 0x00000002
Socket:1.Core:0
I am using hwloc to report the binding and then calculating the
"physical" location. Notice that only single core is reported for each
socket, rather than reporting all 6 cores. This is verified by running
with 6 OpenMP threads per process and getting only 100% CPU load, rather
than 600%.
Now, OpenMPI 1.6.1a1 (just added in today) is doing the affinity right as:
/uufs/ember.arches/sys/pkg/openmpi/1.6.1a1i/bin/mpirun --bysocket
--bind-to-socket -np 2
/uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-bind --get
0x00aaaaaa
0x00555555
/uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p --hierarchical socket.core 0x00aaaaaa
Socket:1.Core:0 Socket:1.Core:1 Socket:1.Core:2 Socket:1.Core:8 Socket:1.Core:9 Socket:1.Core:10
/uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p --hierarchical socket.core 0x00555555
Socket:0.Core:0 Socket:0.Core:1 Socket:0.Core:2 Socket:0.Core:8 Socket:0.Core:9 Socket:0.Core:10
I have a suspicion that other binding options may be broken as well, for
example, I was trying to run -binding cache:l3, which should have the same
efect as cpu:sockets on this machine, and it only ran on one core as well
(100% load).
I would appreciate if someone could comment on this and if this is a bug,
I'd be happy to work with the developers to get this fixed. This shows up
both in 1.4.1p1 and 1.5b1.
Thanks,
MC
--
Martin Cuma
Center for High Performance Computing
University of Utah
More information about the mpich-discuss
mailing list