[MPICH] Problem probing message from dynamically spawned process
Pieter Thysebaert
pieter.thysebaert at intec.ugent.be
Wed May 23 05:12:10 CDT 2007
Hello,
I'm using MPICH2 1.0.5 on Debian Etch AMD64 (mpd daemon). I'm trying to
implement a Master / Worker architecture, where the master can dynamically
spawn additional workers (using MPI_Comm_spawn).
Ultimately, I want the master to listen to its workers using a loop with
MPI_Iprobe statements to process incoming messages. However, when testing
my initial efforts, I have stumbled over a peculiar situation which
(seemingly) allows the Master to receive a worker's (test) message, but
cannot Iprobe for it.
In my testing, the spawned Workers run on the same machine as the Master.
Assume the Worker (residing in an executable called "Worker") looks like
this:
int main(int argc, char** argv) {
MPI_Comm Master;
MPI_Init(&argc, &argv);
MPI_Comm_get_parent(&Master);
if (MDPBlackBoard == MPI_COMM_NULL) {
cerr << "No parent Master!" << endl;
return 1;
}
int size;
MPI_Comm_remote_size(Master, &size);
if (size != 1) {
cerr << "Parent Master doesn't have size 1" << endl;
return 1;
}
// Test: send test message to Master
int test = 37;
MPI_Status s;
MPI_Send(&test, 1, MPI_INT, 0, TAG_TEST, Master);
// Rest of code
}
And the Master begins as
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
MPI_Comm workerComm;
MPI_Info ourInfo;
MPI_Info_create(&ourInfo);
// Spawn Worker
MPI_Comm_spawn("Worker", MPI_ARGV_NULL, 1, ourInfo, 0,
MPI_COMM_SELF, &workerComm, MPI_ERRCODES_IGNORE);
// Test: check test message from worker
for (;;) {
int flag = 0;
int result = MPI_Iprobe(0, TAG_TEST, workerComm, &flag, &s);
cout << "MPI_Iprobe: result is " << result << ", flag is " <<
flag << endl;
if (flag > 0)
break;
}
int test;
MPI_Recv(&test, 1, MPI_INT, 0, TAG_TEST, workerComm, &s);
cout << "BlackBoard: Have received test, data is " << test << endl;
}
What happens when running this architecture (mpiexec -n 1 Master) is that
the Master never leaves its for loop (probing for messages from the
Worker, flag and result equal 0 forever; according to my docs, flag
should become 1 when a message is available), even if I let it run for a
long time.
However, when I remove the for loop in the Master and immediately proceed
to MPI_Recv() of the TAG_TEST message, all goes well (i.e. the message is
received by the master and both master and worker continue).
What am I doing wrong or not understanding correctly?
The message send/receive and probing works fine on this same machine
when two processes are started with mpiexec -n 2 (and thus have ranks 0
and 1 in the same MPI_COMM_WORLD) and MPI_COMM_WORLD is used everywhere.
Pieter
More information about the mpich-discuss
mailing list