[MPICH] one shot jobs in mpich2?
Darius Buntinas
buntinas at mcs.anl.gov
Fri Jun 17 12:47:55 CDT 2005
You can add mpd nodes to an established ring, so as your external source
generates hosts, you can start mpds on them, then use "mpiexec
-machinefile mf ..." to run on them. I'm not sure about removing mpds. I
imagine it should be fine so long as there are no running mpi processes
there.
Below is my idea of how it might work, based on my limited knowledge of
how mpd does its thing. Maybe Ralph could comment on whether this would
work.
You'll probably have to write your own script that adds mpds to an
exisitng ring (I don't think mpdboot lets you do that).
Here's what you would do:
on your "console" node, start mpd:
mpd -d
Get the port number using mpdtrace:
mpdtrace -l
(it should output something like "myhost.mcs.anl.gov_5555" where 5555 is
the port number)
get a machine file from your external source, and start an mpd on each
node:
for host in `cat hostfile` ; do
ssh $host mpd -d -h myhost.mcs.anl.go -p 5555
done
run your program:
mpiexec -machinefile hostfile ...
if you get another program to run, only start mpds on the nodes that don't
already have one.
run your next program. (You should be able to run them concurrently.)
and so on.
So you'll have to keep track of what nodes already have mpds so you don't
launch more than one there.
If you want to remove unneeded mpds I guess you should be able to just
kill them one at a time. They should be able to reknit the ring. Of
course you'll also need to keep track of which mpds are in use and which
ones are idle.
This is just an idea. I'd have to do experiments and possibly look at
source code to make sure that it would actually work, but maybe it'll give
you a starting point.
Perhaps if you tell us more about your environment, there might be a
better solution.
Good luck,
-Darius
On Fri, 17 Jun 2005, Benjamin Rutt wrote:
> "Ralph M. Butler" <rbutler at mtsu.edu> writes:
>
>> You can start the mpd daemons on a set of machines. Then, you can
>> use the -machinefile option (or a set of -host options) to mpiexec
>> to determine where the ranks for a given job are placed. For
>> example, you can start a ring of mpds on machines named:
>> m1, m2, m3, m4, m5, m6
>>
>> Then you can run a job with this machinefile:
>> m1
>> m3
>> m5
>>
>> like this:
>> mpiexec -machinefile mf -n 3 mypgm
>>
>> and the procs will run on the odd-numbered machines.
>>
>> Then, you put this into mf:
>> m2
>> m4
>> m6
>>
>> and run again with the procs executing on the even machines.
>
> Thank you for the idea, but I think this will not help me. In
> general, I cannot start up daemons on all possible hosts before I
> begin (the hosts files are generated by another source, so I will not
> have a comprehensive list in advance).
>
More information about the mpich-discuss
mailing list