[MPICH] Problem probing message from dynamically spawned process

Pieter Thysebaert pieter.thysebaert at intec.ugent.be
Wed May 23 08:01:54 CDT 2007

Following up to my original post:

The difficulties I'm seeing occur when the master process itself is
MPI_Comm_spawn-ed by another process.
So I seem to be having problems with nested MPI_Comm_spawn calls....
Is nesting MPI_comm_spawn supposed to be a functional and supported

So to summarize:

when I only have 2 processes, P1 and P2, and P1 MPI_Comm_spawn-s P2, I
can implement a message loop in P1 (using MPI_Iprobe).

When I have 3 processes with P1 spawning P2 and P2 spawning P3, I can
implement a message loop in P1 listening for messages from P2, I can
also send data from P3 to P2 BUT MPI_Iprobe() in P2, testing for P3
messages always returns false, prohibiting me from implementing a
similar message loop in P2 (listening for P3 messages).

Is there  some race condition or unsupported feature (or blatant misuse
of the MPI API) I'm unaware of?


Pieter Thysebaert wrote:
> Hello,
> I'm using MPICH2 1.0.5 on Debian Etch AMD64 (mpd daemon). I'm trying to
> implement a Master / Worker architecture, where the master can dynamically
> spawn additional workers (using MPI_Comm_spawn).
> Ultimately, I want the master to listen to its workers using a loop with
> MPI_Iprobe statements to process incoming messages. However, when testing
> my initial efforts, I have stumbled over a peculiar situation which
> (seemingly) allows the Master to receive a worker's (test) message, but
> cannot Iprobe for it.
> In my testing, the spawned Workers run on the same machine as the Master.
> Assume the Worker (residing in an executable called "Worker") looks like
> this:
>    int main(int argc, char** argv) {
>     MPI_Comm Master;
>     MPI_Init(&argc, &argv);
>     MPI_Comm_get_parent(&Master);
>     if (MDPBlackBoard == MPI_COMM_NULL) {
>         cerr << "No parent Master!" << endl;
>         return 1;
>     }
>     int size;
>     MPI_Comm_remote_size(Master, &size);
>     if (size != 1) {
>         cerr << "Parent Master doesn't have size 1" << endl;
>         return 1;
>     }
>     // Test: send test message to Master
>     int test = 37;
>     MPI_Status s;
>     MPI_Send(&test, 1, MPI_INT, 0, TAG_TEST, Master);
>     // Rest of code
>    }
> And the Master begins as
>    int main(int argc, char** argv) {
>     MPI_Init(&argc, &argv);
>         MPI_Comm workerComm;    
>     MPI_Info ourInfo;
>     MPI_Info_create(&ourInfo);
>     // Spawn Worker
>     MPI_Comm_spawn("Worker", MPI_ARGV_NULL, 1, ourInfo, 0,
>     // Test: check test message from worker
>     for (;;) {
>         int flag = 0;
>         int result = MPI_Iprobe(0, TAG_TEST, workerComm, &flag, &s);
>         cout <<    "MPI_Iprobe: result is " << result << ", flag is " <<
> flag << endl;
>         if (flag > 0)
>            break;
>     }
>     int test;    
>     MPI_Recv(&test, 1, MPI_INT, 0, TAG_TEST, workerComm, &s);
>     cout <<    "BlackBoard: Have received test, data is " << test << endl;
>    }    
> What happens when running this architecture (mpiexec -n 1 Master) is that
> the Master never leaves its for loop (probing for messages from the
> Worker, flag and result equal 0 forever; according to my docs, flag
> should become 1 when a message is available), even if I let it run for a
> long time.
> However, when I remove the for loop in the Master and immediately proceed
> to MPI_Recv() of the TAG_TEST message, all goes well (i.e. the message is
> received by the master and both master and worker continue).
> What am I doing wrong or not understanding correctly?
> The message send/receive and probing works fine on this same machine
> when two processes are started with mpiexec -n 2 (and thus have ranks 0
> and 1 in the same MPI_COMM_WORLD) and MPI_COMM_WORLD is used everywhere.
> Pieter

More information about the mpich-discuss mailing list