[mpich-discuss] Specifying hosts
Ralph Butler
rbutler at mtsu.edu
Wed May 6 15:41:30 CDT 2009
I am coming to this conversation a little late so if my comments are
missing
the mark, just ignore me.
I have a few mpds running on this set of machines:
(bp400:64)% mpdtrace.py -l
bp400_2000 (161.45.166.2)
bp402_2000 (161.45.166.4)
bp404_2000 (161.45.166.6)
bp401_2000 (161.45.166.3)
bp411_2000 (161.45.166.13)
bp409_2000 (161.45.166.11)
bp410_2000 (161.45.166.12)
bp403_2000 (161.45.166.5)
bp406_2000 (161.45.166.8)
bp408_2000 (161.45.166.10)
bp405_2000 (161.45.166.7)
bp407_2000 (161.45.166.9)
To map ranks to machines, there is first the slightly cubersome version
supported by the mpiexec standard:
(bp400:57)% mpiexec -l -n 1 -host bp407 hostname : -n 1 -host bp411
hostname
0: bp407
1: bp411
Of course, that option can be made a bit more pleasant by putting all
the
args into a file and using the standard -configfile option.
There is also the mpd-specific option -machinefile:
(bp400:61)% cat temp
bp407
bp411
(bp400:62)% mpiexec -l -machinefile temp -n 2 hostname
0: bp407
1: bp411
--ralph
On WedMay 6, at Wed May 6 10:55AM, Dave Goodell wrote:
> On May 6, 2009, at 9:38 AM, Scott Atchley wrote:
>
>> On May 5, 2009, at 5:21 PM, Dave Goodell wrote:
> ...
>> I do not see any switches that would provide a mapping of ranks to
>> hosts. Have I missed it? If there is not, has there been any
>> discussion about providing one? I can imagine that it would be very
>> helpful in combination with "-l" to determine if a job aborts to
>> pinpoint the node for further investigation.
>
> A switch like that would probably be a worthwhile option to add in
> Hydra. I've filed a ticket for it [1].
>
>> I guess I could just add something like this to my submit scripts:
>>
>> mpiexec -l -n <num_cores> hostname
>>
>> before running my actual application.
>
> That might work, although I can't remember if mpd will always launch
> processes in the same order every time. I think it will, but I
> don't recall right now.
>
>>> Sorry for the very surprising behavior. I believe that this
>>> gotcha is not present in our new process manager, Hydra. If this
>>> doesn't solve your problem, let us know and we can dig in a bit
>>> deeper.
>>>
>>> -Dave
>>
>> Is Hydra in the 1.1b1 tarball? If so, I will give it a try.
>
> Hydra is available in the 1.1b1 tarball but you might be better off
> waiting a week or so for 1.1rc1 to be released. There are numerous
> bug fixes and improvements in the upcoming release. Pavan could
> tell you for sure whether or not you will have any trouble with
> 1.1b1, I haven't been following the Hydra issues closely enough to
> say anything definitive about it.
>
> -Dave
>
> [1] https://trac.mcs.anl.gov/projects/mpich2/ticket/575
More information about the mpich-discuss
mailing list