[MPICH] MPI_Comm_spawn, -usize and -machinefile

Martin Siegert siegert at sfu.ca
Thu Jan 5 20:39:34 CST 2006


Hi,

I am trying to figure out how to use MPI_Comm_spawn. In particular,
I want the slave processes spawned on nodes specified in the
-machinefile argument to mpiexec, e.g.,

mpiexec -machinefile mpihosts -usize 4 -n 1 ./master_prog ./slave_prog

master_prog then calls

MPI_Comm_spawn(argv[1], slave_argv, universe_size-1,
               MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,
               MPI_ERRCODES_IGNORE);

and I expected that those slave processes would run on the remaining
hosts specified in the "mpihosts" file (there are 4 hosts in that file).
That's not what is happening, instead the slaves are spawned on the
first 3 hosts listed by mpdtrace. Is there anyway to have those slaves
started on the nodes specified in the mpihosts file?

Or is the only way to achieve this by doing

export MPD_USE_ROOT_MPD=0
mpdboot -n 4 -f mpihosts
mpiexec -usize 4 -n 1 ./master_prog ./slave_prog
mpdallexit

(this is with mpich2-1.0.3 and I usually use the mpd's started by root
at boot time on each node, i.e., every user by default has the
environment variable MPD_USE_ROOT_MPD set to 1).

Thanks for your advice in advance!

Cheers,
Martin

-- 
Martin Siegert
Head, HPC at SFU
WestGrid Site Manager
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6




More information about the mpich-discuss mailing list