[MPICH] MPI_Comm_spawn, -usize and -machinefile

John Robinson jr at vertica.com
Sat Jan 7 10:05:05 CST 2006


Looks to me like this is a limitation of MPI_COMM_SPAWN.  Its site 
argument really wants to be a process subset of MPI_COMM_WORLD's set, 
not a count.  So you will need the spawned program to figure out the new 
intracomm after it starts.  I could imagine a general-purpose spawnee 
wrapper main() that takes the desired process set (or an exclude mask) 
from argv[], and calls the real main() using the subset-intracomm.

Something like this would be a nice (implementation-specific) extension 
  using the Info argument.  In your example, it could just be a simple 
flag meaning "exclude parent processes' hosts from spawned process set". 
  [note there can be more than one parent process!]

It is not pretty that the application code has to get involved in the 
guts of process management this way, but I don't see a different way to 
do it.

/jr
---
Martin Siegert wrote:
> Sorry for replying to my own email, but ...
> 
> On Thu, Jan 05, 2006 at 06:39:34PM -0800, Martin Siegert wrote:
> 
>>Hi,
>>
>>I am trying to figure out how to use MPI_Comm_spawn. In particular,
>>I want the slave processes spawned on nodes specified in the
>>-machinefile argument to mpiexec, e.g.,
>>
>>mpiexec -machinefile mpihosts -usize 4 -n 1 ./master_prog ./slave_prog
>>
>>master_prog then calls
>>
>>MPI_Comm_spawn(argv[1], slave_argv, universe_size-1,
>>               MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,
>>               MPI_ERRCODES_IGNORE);
>>
>>and I expected that those slave processes would run on the remaining
>>hosts specified in the "mpihosts" file (there are 4 hosts in that file).
>>That's not what is happening, instead the slaves are spawned on the
>>first 3 hosts listed by mpdtrace. Is there anyway to have those slaves
>>started on the nodes specified in the mpihosts file?
>>
>>Or is the only way to achieve this by doing
>>
>>export MPD_USE_ROOT_MPD=0
>>mpdboot -n 4 -f mpihosts
>>mpiexec -usize 4 -n 1 ./master_prog ./slave_prog
>>mpdallexit
>>
>>(this is with mpich2-1.0.3 and I usually use the mpd's started by root
>>at boot time on each node, i.e., every user by default has the
>>environment variable MPD_USE_ROOT_MPD set to 1).
> 
> 
> even this last method does not work:
> assume I a "mpihosts" file
> 
> r1
> r2
> r2
> r3
> r4
> r4
> 
> - usually this would be the $PBS_NODFILE generated by the batch scheduler.
> I can get the no. of mpd to boot through 
> nmpd=`cat mpihosts | sort -u | wc -l`
> and the no. of processes through
> ncpus=`cat mpihosts | wc -l`
> and then would do
> 
> unset MPD_USE_ROOT_MPD
> mpdboot -n $nmpd -f mpihosts -r rsh
> mpiexec -usize $ncpus -n 1 ./master_prog ./slave_prog
> 
> But this starts the slaves on the wrong hosts as well, e.g., assuming that
> mpdtrace shows
> 
> r1
> r3
> r2
> r4
> 
> I would have a master on r1 and slaves on r1, r3, r3, r2, and r4.
> I then tried
> 
> mpdboot -n 6 -f mpihosts -r rsh -1
> mpdtrace
> r1
> r2
> r1
> r4
> r2
> r3
> 
> which again shows the wrong list of hosts: 2 mpds on r1 and r2 instead of
> two mpds on r2 and r4. Isn't "mpdboot -1 -f mpihosts ..." supposed to
> start one mpd for each line in the mpihosts file?
> [also: mpdboot -1 appears to be quite unreliable: about half the time
> when I try this I get an error
> mpdboot_r1 (handle_mpd_output 368): failed to connect to mpd on r2]
> 
> The only way I got this to work was:
> 
> mpd &
> port=`mpdtrace -l | sed -e 's/.*_//' -e 's/[^0-9].*//'`
> rsh -n r2 'unset MPD_USE_ROOT_MPD;mpd -p $port' &
> rsh -n r2 'unset MPD_USE_ROOT_MPD;mpd -p $port --noconsole' &
> rsh -n r3 'unset MPD_USE_ROOT_MPD;mpd -p $port' &
> rsh -n r4 'unset MPD_USE_ROOT_MPD;mpd -p $port' &
> rsh -n r4 'unset MPD_USE_ROOT_MPD;mpd -p $port --noconsole' &
> mpiexec -usize 6 -n 1 ./master_prog ./slave_prog
> 
> which is really too ugly and complicated for general use.
> I guess I could write a script that does the parsing of the PBS_NODEFILE
> and starts the mpd, but isn't there an easier way?
> 
> Cheers,
> Martin
> 




More information about the mpich-discuss mailing list