[MPICH] MPI_Comm_spawn, -usize and -machinefile
John Robinson
jr at vertica.com
Sat Jan 7 10:05:05 CST 2006
Looks to me like this is a limitation of MPI_COMM_SPAWN. Its site
argument really wants to be a process subset of MPI_COMM_WORLD's set,
not a count. So you will need the spawned program to figure out the new
intracomm after it starts. I could imagine a general-purpose spawnee
wrapper main() that takes the desired process set (or an exclude mask)
from argv[], and calls the real main() using the subset-intracomm.
Something like this would be a nice (implementation-specific) extension
using the Info argument. In your example, it could just be a simple
flag meaning "exclude parent processes' hosts from spawned process set".
[note there can be more than one parent process!]
It is not pretty that the application code has to get involved in the
guts of process management this way, but I don't see a different way to
do it.
/jr
---
Martin Siegert wrote:
> Sorry for replying to my own email, but ...
>
> On Thu, Jan 05, 2006 at 06:39:34PM -0800, Martin Siegert wrote:
>
>>Hi,
>>
>>I am trying to figure out how to use MPI_Comm_spawn. In particular,
>>I want the slave processes spawned on nodes specified in the
>>-machinefile argument to mpiexec, e.g.,
>>
>>mpiexec -machinefile mpihosts -usize 4 -n 1 ./master_prog ./slave_prog
>>
>>master_prog then calls
>>
>>MPI_Comm_spawn(argv[1], slave_argv, universe_size-1,
>> MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,
>> MPI_ERRCODES_IGNORE);
>>
>>and I expected that those slave processes would run on the remaining
>>hosts specified in the "mpihosts" file (there are 4 hosts in that file).
>>That's not what is happening, instead the slaves are spawned on the
>>first 3 hosts listed by mpdtrace. Is there anyway to have those slaves
>>started on the nodes specified in the mpihosts file?
>>
>>Or is the only way to achieve this by doing
>>
>>export MPD_USE_ROOT_MPD=0
>>mpdboot -n 4 -f mpihosts
>>mpiexec -usize 4 -n 1 ./master_prog ./slave_prog
>>mpdallexit
>>
>>(this is with mpich2-1.0.3 and I usually use the mpd's started by root
>>at boot time on each node, i.e., every user by default has the
>>environment variable MPD_USE_ROOT_MPD set to 1).
>
>
> even this last method does not work:
> assume I a "mpihosts" file
>
> r1
> r2
> r2
> r3
> r4
> r4
>
> - usually this would be the $PBS_NODFILE generated by the batch scheduler.
> I can get the no. of mpd to boot through
> nmpd=`cat mpihosts | sort -u | wc -l`
> and the no. of processes through
> ncpus=`cat mpihosts | wc -l`
> and then would do
>
> unset MPD_USE_ROOT_MPD
> mpdboot -n $nmpd -f mpihosts -r rsh
> mpiexec -usize $ncpus -n 1 ./master_prog ./slave_prog
>
> But this starts the slaves on the wrong hosts as well, e.g., assuming that
> mpdtrace shows
>
> r1
> r3
> r2
> r4
>
> I would have a master on r1 and slaves on r1, r3, r3, r2, and r4.
> I then tried
>
> mpdboot -n 6 -f mpihosts -r rsh -1
> mpdtrace
> r1
> r2
> r1
> r4
> r2
> r3
>
> which again shows the wrong list of hosts: 2 mpds on r1 and r2 instead of
> two mpds on r2 and r4. Isn't "mpdboot -1 -f mpihosts ..." supposed to
> start one mpd for each line in the mpihosts file?
> [also: mpdboot -1 appears to be quite unreliable: about half the time
> when I try this I get an error
> mpdboot_r1 (handle_mpd_output 368): failed to connect to mpd on r2]
>
> The only way I got this to work was:
>
> mpd &
> port=`mpdtrace -l | sed -e 's/.*_//' -e 's/[^0-9].*//'`
> rsh -n r2 'unset MPD_USE_ROOT_MPD;mpd -p $port' &
> rsh -n r2 'unset MPD_USE_ROOT_MPD;mpd -p $port --noconsole' &
> rsh -n r3 'unset MPD_USE_ROOT_MPD;mpd -p $port' &
> rsh -n r4 'unset MPD_USE_ROOT_MPD;mpd -p $port' &
> rsh -n r4 'unset MPD_USE_ROOT_MPD;mpd -p $port --noconsole' &
> mpiexec -usize 6 -n 1 ./master_prog ./slave_prog
>
> which is really too ugly and complicated for general use.
> I guess I could write a script that does the parsing of the PBS_NODEFILE
> and starts the mpd, but isn't there an easier way?
>
> Cheers,
> Martin
>
More information about the mpich-discuss
mailing list