[MPICH] Problem probing message from dynamically spawned process

Pieter Thysebaert pieter.thysebaert at intec.ugent.be
Wed May 23 08:01:54 CDT 2007


Following up to my original post:

The difficulties I'm seeing occur when the master process itself is
MPI_Comm_spawn-ed by another process.
So I seem to be having problems with nested MPI_Comm_spawn calls....
Is nesting MPI_comm_spawn supposed to be a functional and supported
operation?


So to summarize:

when I only have 2 processes, P1 and P2, and P1 MPI_Comm_spawn-s P2, I
can implement a message loop in P1 (using MPI_Iprobe).

When I have 3 processes with P1 spawning P2 and P2 spawning P3, I can
implement a message loop in P1 listening for messages from P2, I can
also send data from P3 to P2 BUT MPI_Iprobe() in P2, testing for P3
messages always returns false, prohibiting me from implementing a
similar message loop in P2 (listening for P3 messages).


Is there  some race condition or unsupported feature (or blatant misuse
of the MPI API) I'm unaware of?

Thanks,
Pieter



Pieter Thysebaert wrote:
> Hello,
>
> I'm using MPICH2 1.0.5 on Debian Etch AMD64 (mpd daemon). I'm trying to
> implement a Master / Worker architecture, where the master can dynamically
> spawn additional workers (using MPI_Comm_spawn).
>
> Ultimately, I want the master to listen to its workers using a loop with
> MPI_Iprobe statements to process incoming messages. However, when testing
> my initial efforts, I have stumbled over a peculiar situation which
> (seemingly) allows the Master to receive a worker's (test) message, but
> cannot Iprobe for it.
>
> In my testing, the spawned Workers run on the same machine as the Master.
>
> Assume the Worker (residing in an executable called "Worker") looks like
> this:
>
>    int main(int argc, char** argv) {
>     MPI_Comm Master;
>         
>     MPI_Init(&argc, &argv);
>     MPI_Comm_get_parent(&Master);
>     if (MDPBlackBoard == MPI_COMM_NULL) {
>         cerr << "No parent Master!" << endl;
>         return 1;
>     }
>     
>     int size;
>     MPI_Comm_remote_size(Master, &size);
>     if (size != 1) {
>         cerr << "Parent Master doesn't have size 1" << endl;
>         return 1;
>     }
>     // Test: send test message to Master
>     int test = 37;
>     MPI_Status s;
>     MPI_Send(&test, 1, MPI_INT, 0, TAG_TEST, Master);
>     // Rest of code
>    }
>
>
> And the Master begins as
>    
>    int main(int argc, char** argv) {
>     MPI_Init(&argc, &argv);
>
>         MPI_Comm workerComm;    
>     MPI_Info ourInfo;
>     MPI_Info_create(&ourInfo);
>     
>     // Spawn Worker
>     MPI_Comm_spawn("Worker", MPI_ARGV_NULL, 1, ourInfo, 0,
> MPI_COMM_SELF, &workerComm, MPI_ERRCODES_IGNORE);
>     
>     // Test: check test message from worker
>     for (;;) {
>         int flag = 0;
>         int result = MPI_Iprobe(0, TAG_TEST, workerComm, &flag, &s);
>         cout <<    "MPI_Iprobe: result is " << result << ", flag is " <<
> flag << endl;
>         if (flag > 0)
>            break;
>     }
>         
>     int test;    
>     MPI_Recv(&test, 1, MPI_INT, 0, TAG_TEST, workerComm, &s);
>     cout <<    "BlackBoard: Have received test, data is " << test << endl;
>    }    
>
>
> What happens when running this architecture (mpiexec -n 1 Master) is that
> the Master never leaves its for loop (probing for messages from the
> Worker, flag and result equal 0 forever; according to my docs, flag
> should become 1 when a message is available), even if I let it run for a
> long time.
>
> However, when I remove the for loop in the Master and immediately proceed
> to MPI_Recv() of the TAG_TEST message, all goes well (i.e. the message is
> received by the master and both master and worker continue).
>
> What am I doing wrong or not understanding correctly?
>
> The message send/receive and probing works fine on this same machine
> when two processes are started with mpiexec -n 2 (and thus have ranks 0
> and 1 in the same MPI_COMM_WORLD) and MPI_COMM_WORLD is used everywhere.
>
>
> Pieter
>   




More information about the mpich-discuss mailing list