[mpich-discuss] Hydra process affinity
Pavan Balaji
balaji at mcs.anl.gov
Mon Jul 9 19:36:45 CDT 2012
Correct. We don't have an option to bind to the entire socket
currently. We are planning to revamp the binding model for the 1.5
release, which should cover this case too.
-- Pavan
On 05/25/2012 03:20 AM, Guillaume Mercier wrote:
>
> Hello,
>
> Pavan will probably confirm this point, but my guess is that the meaning
> of "binding to sockets"
> differs from one implementation to the other.
> If you take a look at the Hydra documentation, you will see that the
> behaviour you're experiencing
> seems to be correct. But I agree that it's not the one you want, though.
>
> Regards
> Guillaume
>
>
> On 05/25/2012 01:14 AM, Martin Cuma wrote:
>> Hello,
>>
>> I am trying to get consistent performance on dual socket multi-core
>> nodes which requires process binding to the socket. The core generally
>> runs one process per socket and launches multiple OpenMP threads to
>> fill up the socket's cores.
>>
>> I find a problem with the -binding cpu:sockets in the Hydra's mpirun
>> implementation - it binds the process to the first core on the socket,
>> rather than having it access all the socket's cores.
>>
>> For example, on a dual socket, 6 core CPU (12 cores total) node, I get:
>> /uufs/chpc.utah.edu/sys/pkg/mpich2/1.5b1/bin/mpirun -binding
>> cpu:sockets -np 2
>> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-bind --get
>> 0x00000001
>> 0x00000002
>> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p
>> --hierarchical socket.core 0x00000001
>> Socket:0.Core:0
>> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p
>> --hierarchical socket.core 0x00000002
>> Socket:1.Core:0
>>
>> I am using hwloc to report the binding and then calculating the
>> "physical" location. Notice that only single core is reported for each
>> socket, rather than reporting all 6 cores. This is verified by running
>> with 6 OpenMP threads per process and getting only 100% CPU load,
>> rather than 600%.
>>
>> Now, OpenMPI 1.6.1a1 (just added in today) is doing the affinity right
>> as:
>> /uufs/ember.arches/sys/pkg/openmpi/1.6.1a1i/bin/mpirun --bysocket
>> --bind-to-socket -np 2
>> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-bind --get
>> 0x00aaaaaa
>> 0x00555555
>> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p
>> --hierarchical socket.core 0x00aaaaaa
>> Socket:1.Core:0 Socket:1.Core:1 Socket:1.Core:2 Socket:1.Core:8
>> Socket:1.Core:9 Socket:1.Core:10
>> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p
>> --hierarchical socket.core 0x00555555
>> Socket:0.Core:0 Socket:0.Core:1 Socket:0.Core:2 Socket:0.Core:8
>> Socket:0.Core:9 Socket:0.Core:10
>>
>> I have a suspicion that other binding options may be broken as well,
>> for example, I was trying to run -binding cache:l3, which should have
>> the same efect as cpu:sockets on this machine, and it only ran on one
>> core as well (100% load).
>>
>> I would appreciate if someone could comment on this and if this is a
>> bug, I'd be happy to work with the developers to get this fixed. This
>> shows up both in 1.4.1p1 and 1.5b1.
>>
>> Thanks,
>> MC
>>
>>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list