[MPICH] Problem probing message from dynamically spawned process

Rajeev Thakur thakur at mcs.anl.gov
Wed May 23 11:50:28 CDT 2007


It should work. Can you send us a test program that we can use to reproduce
the problem?

Rajeev 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Pieter 
> Thysebaert
> Sent: Wednesday, May 23, 2007 8:02 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] Problem probing message from dynamically 
> spawned process
> 
> Following up to my original post:
> 
> The difficulties I'm seeing occur when the master process itself is
> MPI_Comm_spawn-ed by another process.
> So I seem to be having problems with nested MPI_Comm_spawn calls....
> Is nesting MPI_comm_spawn supposed to be a functional and supported
> operation?
> 
> 
> So to summarize:
> 
> when I only have 2 processes, P1 and P2, and P1 MPI_Comm_spawn-s P2, I
> can implement a message loop in P1 (using MPI_Iprobe).
> 
> When I have 3 processes with P1 spawning P2 and P2 spawning P3, I can
> implement a message loop in P1 listening for messages from P2, I can
> also send data from P3 to P2 BUT MPI_Iprobe() in P2, testing for P3
> messages always returns false, prohibiting me from implementing a
> similar message loop in P2 (listening for P3 messages).
> 
> 
> Is there  some race condition or unsupported feature (or 
> blatant misuse
> of the MPI API) I'm unaware of?
> 
> Thanks,
> Pieter
> 
> 
> 
> Pieter Thysebaert wrote:
> > Hello,
> >
> > I'm using MPICH2 1.0.5 on Debian Etch AMD64 (mpd daemon). 
> I'm trying to
> > implement a Master / Worker architecture, where the master 
> can dynamically
> > spawn additional workers (using MPI_Comm_spawn).
> >
> > Ultimately, I want the master to listen to its workers 
> using a loop with
> > MPI_Iprobe statements to process incoming messages. 
> However, when testing
> > my initial efforts, I have stumbled over a peculiar situation which
> > (seemingly) allows the Master to receive a worker's (test) 
> message, but
> > cannot Iprobe for it.
> >
> > In my testing, the spawned Workers run on the same machine 
> as the Master.
> >
> > Assume the Worker (residing in an executable called 
> "Worker") looks like
> > this:
> >
> >    int main(int argc, char** argv) {
> >     MPI_Comm Master;
> >         
> >     MPI_Init(&argc, &argv);
> >     MPI_Comm_get_parent(&Master);
> >     if (MDPBlackBoard == MPI_COMM_NULL) {
> >         cerr << "No parent Master!" << endl;
> >         return 1;
> >     }
> >     
> >     int size;
> >     MPI_Comm_remote_size(Master, &size);
> >     if (size != 1) {
> >         cerr << "Parent Master doesn't have size 1" << endl;
> >         return 1;
> >     }
> >     // Test: send test message to Master
> >     int test = 37;
> >     MPI_Status s;
> >     MPI_Send(&test, 1, MPI_INT, 0, TAG_TEST, Master);
> >     // Rest of code
> >    }
> >
> >
> > And the Master begins as
> >    
> >    int main(int argc, char** argv) {
> >     MPI_Init(&argc, &argv);
> >
> >         MPI_Comm workerComm;    
> >     MPI_Info ourInfo;
> >     MPI_Info_create(&ourInfo);
> >     
> >     // Spawn Worker
> >     MPI_Comm_spawn("Worker", MPI_ARGV_NULL, 1, ourInfo, 0,
> > MPI_COMM_SELF, &workerComm, MPI_ERRCODES_IGNORE);
> >     
> >     // Test: check test message from worker
> >     for (;;) {
> >         int flag = 0;
> >         int result = MPI_Iprobe(0, TAG_TEST, workerComm, &flag, &s);
> >         cout <<    "MPI_Iprobe: result is " << result << ", 
> flag is " <<
> > flag << endl;
> >         if (flag > 0)
> >            break;
> >     }
> >         
> >     int test;    
> >     MPI_Recv(&test, 1, MPI_INT, 0, TAG_TEST, workerComm, &s);
> >     cout <<    "BlackBoard: Have received test, data is " 
> << test << endl;
> >    }    
> >
> >
> > What happens when running this architecture (mpiexec -n 1 
> Master) is that
> > the Master never leaves its for loop (probing for messages from the
> > Worker, flag and result equal 0 forever; according to my docs, flag
> > should become 1 when a message is available), even if I let 
> it run for a
> > long time.
> >
> > However, when I remove the for loop in the Master and 
> immediately proceed
> > to MPI_Recv() of the TAG_TEST message, all goes well (i.e. 
> the message is
> > received by the master and both master and worker continue).
> >
> > What am I doing wrong or not understanding correctly?
> >
> > The message send/receive and probing works fine on this same machine
> > when two processes are started with mpiexec -n 2 (and thus 
> have ranks 0
> > and 1 in the same MPI_COMM_WORLD) and MPI_COMM_WORLD is 
> used everywhere.
> >
> >
> > Pieter
> >   
> 
> 




More information about the mpich-discuss mailing list