[mpich-discuss] Specifying hosts

Ralph Butler rbutler at mtsu.edu
Wed May 6 15:41:30 CDT 2009


I am coming to this conversation a little late so if my comments are  
missing
the mark, just ignore me.

I have a few mpds running on this set of machines:
(bp400:64)% mpdtrace.py  -l
bp400_2000 (161.45.166.2)
bp402_2000 (161.45.166.4)
bp404_2000 (161.45.166.6)
bp401_2000 (161.45.166.3)
bp411_2000 (161.45.166.13)
bp409_2000 (161.45.166.11)
bp410_2000 (161.45.166.12)
bp403_2000 (161.45.166.5)
bp406_2000 (161.45.166.8)
bp408_2000 (161.45.166.10)
bp405_2000 (161.45.166.7)
bp407_2000 (161.45.166.9)


To map ranks to machines, there is first the slightly cubersome version
supported by the mpiexec standard:

(bp400:57)% mpiexec -l -n 1 -host bp407 hostname : -n 1 -host bp411  
hostname
0: bp407
1: bp411

Of course, that option can be made a bit more pleasant by putting all  
the
args into a file and using the standard -configfile option.


There is also the mpd-specific option -machinefile:

(bp400:61)% cat temp
bp407
bp411
(bp400:62)% mpiexec -l -machinefile temp -n 2 hostname
0: bp407
1: bp411

--ralph

On WedMay 6, at Wed May 6 10:55AM, Dave Goodell wrote:

> On May 6, 2009, at 9:38 AM, Scott Atchley wrote:
>
>> On May 5, 2009, at 5:21 PM, Dave Goodell wrote:
> ...
>> I do not see any switches that would provide a mapping of ranks to  
>> hosts. Have I missed it? If there is not, has there been any  
>> discussion about providing one? I can imagine that it would be very  
>> helpful in combination with "-l" to determine if a job aborts to  
>> pinpoint the node for further investigation.
>
> A switch like that would probably be a worthwhile option to add in  
> Hydra.  I've filed a ticket for it [1].
>
>> I guess I could just add something like this to my submit scripts:
>>
>> mpiexec -l -n <num_cores> hostname
>>
>> before running my actual application.
>
> That might work, although I can't remember if mpd will always launch  
> processes in the same order every time.  I think it will, but I  
> don't recall right now.
>
>>> Sorry for the very surprising behavior.  I believe that this  
>>> gotcha is not present in our new process manager, Hydra.  If this  
>>> doesn't solve your problem, let us know and we can dig in a bit  
>>> deeper.
>>>
>>> -Dave
>>
>> Is Hydra in the 1.1b1 tarball? If so, I will give it a try.
>
> Hydra is available in the 1.1b1 tarball but you might be better off  
> waiting a week or so for 1.1rc1 to be released.  There are numerous  
> bug fixes and improvements in the upcoming release.  Pavan could  
> tell you for sure whether or not you will have any trouble with  
> 1.1b1, I haven't been following the Hydra issues closely enough to  
> say anything definitive about it.
>
> -Dave
>
> [1] https://trac.mcs.anl.gov/projects/mpich2/ticket/575



More information about the mpich-discuss mailing list