[mpich-discuss] Hydra process placement verification

Pavan Balaji balaji at mcs.anl.gov
Tue Aug 31 12:17:35 CDT 2010


[please keep mpich-discuss cc'ed].

In top you can type "f" (to add a field) followed by "j" (to show which 
CPU each process is running on).

  -- Pavan

On 08/31/2010 10:43 AM, Jeffrey J. Evans wrote:
> Thanks Pavan,
>
> Running "top": DO I assume that I am seeing "core order" on those processes at the top of the list? i.e. the topmost process is running on core 0?
>
> If that is the case then I am seeing my processes bouncing around onto different cores over time.
>
> Does this behavior make sense to you?
>
> Jeff
>
> Jeffrey J. Evans
> jje at purdue.edu
> http://web.ics.purdue.edu/~evans6/
>
>
>
>
> On Aug 31, 2010, at 11:29 AM, Pavan Balaji wrote:
>
>> Hi,
>>
>> 1.1.1p1 only had an experimental version of Hydra at that point, and if I remember correctly it didn't have any support for hwloc at that point. However, you can download the latest version of Hydra (1.3b1) and use that with your existing application without recompiling it.
>>
>> With respect to your results, if you are running on the same node, MPICH2 should return the same value for all processes, unless the OS returns a different "hostname" for each core on the system. Is this the case for your system?
>>
>> Another way to checking whether the bindings are working correctly is using "top". But for that the application has to run for a few seconds, at least.
>>
>> -- Pavan
>>
>> On 08/31/2010 10:22 AM, Jeffrey J. Evans wrote:
>>> I am trying to learn how to correctly verify hydra process placement.
>>>
>>> For example: I have an MPI program that uses 6 processes. On a node with 2 quad-core processors I setup the following host file
>>>
>>> hpn10:8 binding=user:1,2,3,5,6,7
>>>
>>> My MPI program grabs the process information from each process using MPI_Get_processor_name()
>>>
>>> The resulting output:
>>> # 000: OK on hpn10/0, EA: 1, rank: 0 ptrn 1 tick: 1.000000e-06
>>> # 001: OK on hpn10/1, EA: 1, rank: 1 ptrn 1 tick: 1.000000e-06
>>> # 002: OK on hpn10/2, EA: 1, rank: 2 ptrn 1 tick: 1.000000e-06
>>> # 003: OK on hpn10/3, EA: 1, rank: 3 ptrn 1 tick: 1.000000e-06
>>> # 004: OK on hpn10/4, EA: 1, rank: 4 ptrn 1 tick: 1.000000e-06
>>> # 005: OK on hpn10/5, EA: 1, rank: 5 ptrn 1 tick: 1.000000e-06
>>>
>>> I was expecting to see something like:
>>> # 000: OK on hpn10/1, EA: 1, rank: 0 ptrn 1 tick: 1.000000e-06
>>> # 001: OK on hpn10/2, EA: 1, rank: 1 ptrn 1 tick: 1.000000e-06
>>> # 002: OK on hpn10/3, EA: 1, rank: 2 ptrn 1 tick: 1.000000e-06
>>> # 003: OK on hpn10/5, EA: 1, rank: 3 ptrn 1 tick: 1.000000e-06
>>> # 004: OK on hpn10/6, EA: 1, rank: 4 ptrn 1 tick: 1.000000e-06
>>> # 005: OK on hpn10/7, EA: 1, rank: 5 ptrn 1 tick: 1.000000e-06
>>>
>>> My real problem is that I cannot locate in the documentation how hydra uses hwloc to bind processes to cores. Does hydra need paths to hwloc bin and lib directories? Does mpich2 need to be rebuilt with configuration information regarding the location of hwloc binaries and libraries?
>>>
>>> To provide more info - my mpich2-1.1.1p1 build was done with the pm:hydra configuration set, but hwloc was installed later. Does mpich2 need to be rebuilt?
>>> The resource manager = Torque, scheduler = Maui.
>>>
>>> How can I verify core placement?
>>>
>>> Jeffrey J. Evans
>>> jje at purdue.edu
>>> http://web.ics.purdue.edu/~evans6/
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list