[mpich-discuss] Fwd: Hydra process placement verification

Pavan Balaji balaji at mcs.anl.gov
Tue Aug 31 12:18:19 CDT 2010


[cc'ing mpich-discuss]

Hydra contains hwloc, so it'll build it internally. You don't need to 
set it up separately.

On 08/31/2010 10:48 AM, Jeffrey J. Evans wrote:
> Pavan,
>
> In re-reading your message, I would prefer not to use hwloc directly if I can avoid it, rather I would prefer to count on hydra. Does the latest version of mpich2 handle the hydra interface with hwloc better?
>
> If not, can you point me to an example of using hwloc directly so I can expedite my learning curve? The hwloc user docs remain a bit cryptic.
>
> Thanks,
>
> Jeff
>
> Jeffrey J. Evans
> jje at purdue.edu
> http://web.ics.purdue.edu/~evans6/
>
>
>
>
> Begin forwarded message:
>
>> From: "Jeffrey J. Evans"<jje at purdue.edu>
>> Date: August 31, 2010 11:43:10 AM EDT
>> To: Pavan Balaji<balaji at mcs.anl.gov>
>> Subject: Re: [mpich-discuss] Hydra process placement verification
>> Reply-To: Evans Jeffrey<jje at purdue.edu>
>>
>> Thanks Pavan,
>>
>> Running "top": DO I assume that I am seeing "core order" on those processes at the top of the list? i.e. the topmost process is running on core 0?
>>
>> If that is the case then I am seeing my processes bouncing around onto different cores over time.
>>
>> Does this behavior make sense to you?
>>
>> Jeff
>>
>> Jeffrey J. Evans
>> jje at purdue.edu
>> http://web.ics.purdue.edu/~evans6/
>>
>>
>>
>>
>> On Aug 31, 2010, at 11:29 AM, Pavan Balaji wrote:
>>
>>> Hi,
>>>
>>> 1.1.1p1 only had an experimental version of Hydra at that point, and if I remember correctly it didn't have any support for hwloc at that point. However, you can download the latest version of Hydra (1.3b1) and use that with your existing application without recompiling it.
>>>
>>> With respect to your results, if you are running on the same node, MPICH2 should return the same value for all processes, unless the OS returns a different "hostname" for each core on the system. Is this the case for your system?
>>>
>>> Another way to checking whether the bindings are working correctly is using "top". But for that the application has to run for a few seconds, at least.
>>>
>>> -- Pavan
>>>
>>> On 08/31/2010 10:22 AM, Jeffrey J. Evans wrote:
>>>> I am trying to learn how to correctly verify hydra process placement.
>>>>
>>>> For example: I have an MPI program that uses 6 processes. On a node with 2 quad-core processors I setup the following host file
>>>>
>>>> hpn10:8 binding=user:1,2,3,5,6,7
>>>>
>>>> My MPI program grabs the process information from each process using MPI_Get_processor_name()
>>>>
>>>> The resulting output:
>>>> # 000: OK on hpn10/0, EA: 1, rank: 0 ptrn 1 tick: 1.000000e-06
>>>> # 001: OK on hpn10/1, EA: 1, rank: 1 ptrn 1 tick: 1.000000e-06
>>>> # 002: OK on hpn10/2, EA: 1, rank: 2 ptrn 1 tick: 1.000000e-06
>>>> # 003: OK on hpn10/3, EA: 1, rank: 3 ptrn 1 tick: 1.000000e-06
>>>> # 004: OK on hpn10/4, EA: 1, rank: 4 ptrn 1 tick: 1.000000e-06
>>>> # 005: OK on hpn10/5, EA: 1, rank: 5 ptrn 1 tick: 1.000000e-06
>>>>
>>>> I was expecting to see something like:
>>>> # 000: OK on hpn10/1, EA: 1, rank: 0 ptrn 1 tick: 1.000000e-06
>>>> # 001: OK on hpn10/2, EA: 1, rank: 1 ptrn 1 tick: 1.000000e-06
>>>> # 002: OK on hpn10/3, EA: 1, rank: 2 ptrn 1 tick: 1.000000e-06
>>>> # 003: OK on hpn10/5, EA: 1, rank: 3 ptrn 1 tick: 1.000000e-06
>>>> # 004: OK on hpn10/6, EA: 1, rank: 4 ptrn 1 tick: 1.000000e-06
>>>> # 005: OK on hpn10/7, EA: 1, rank: 5 ptrn 1 tick: 1.000000e-06
>>>>
>>>> My real problem is that I cannot locate in the documentation how hydra uses hwloc to bind processes to cores. Does hydra need paths to hwloc bin and lib directories? Does mpich2 need to be rebuilt with configuration information regarding the location of hwloc binaries and libraries?
>>>>
>>>> To provide more info - my mpich2-1.1.1p1 build was done with the pm:hydra configuration set, but hwloc was installed later. Does mpich2 need to be rebuilt?
>>>> The resource manager = Torque, scheduler = Maui.
>>>>
>>>> How can I verify core placement?
>>>>
>>>> Jeffrey J. Evans
>>>> jje at purdue.edu
>>>> http://web.ics.purdue.edu/~evans6/
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> --
>>> Pavan Balaji
>>> http://www.mcs.anl.gov/~balaji
>>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list