[MPICH] Problem probing message from dynamically spawned process
Rajeev Thakur
thakur at mcs.anl.gov
Wed May 23 11:50:28 CDT 2007
It should work. Can you send us a test program that we can use to reproduce
the problem?
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Pieter
> Thysebaert
> Sent: Wednesday, May 23, 2007 8:02 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] Problem probing message from dynamically
> spawned process
>
> Following up to my original post:
>
> The difficulties I'm seeing occur when the master process itself is
> MPI_Comm_spawn-ed by another process.
> So I seem to be having problems with nested MPI_Comm_spawn calls....
> Is nesting MPI_comm_spawn supposed to be a functional and supported
> operation?
>
>
> So to summarize:
>
> when I only have 2 processes, P1 and P2, and P1 MPI_Comm_spawn-s P2, I
> can implement a message loop in P1 (using MPI_Iprobe).
>
> When I have 3 processes with P1 spawning P2 and P2 spawning P3, I can
> implement a message loop in P1 listening for messages from P2, I can
> also send data from P3 to P2 BUT MPI_Iprobe() in P2, testing for P3
> messages always returns false, prohibiting me from implementing a
> similar message loop in P2 (listening for P3 messages).
>
>
> Is there some race condition or unsupported feature (or
> blatant misuse
> of the MPI API) I'm unaware of?
>
> Thanks,
> Pieter
>
>
>
> Pieter Thysebaert wrote:
> > Hello,
> >
> > I'm using MPICH2 1.0.5 on Debian Etch AMD64 (mpd daemon).
> I'm trying to
> > implement a Master / Worker architecture, where the master
> can dynamically
> > spawn additional workers (using MPI_Comm_spawn).
> >
> > Ultimately, I want the master to listen to its workers
> using a loop with
> > MPI_Iprobe statements to process incoming messages.
> However, when testing
> > my initial efforts, I have stumbled over a peculiar situation which
> > (seemingly) allows the Master to receive a worker's (test)
> message, but
> > cannot Iprobe for it.
> >
> > In my testing, the spawned Workers run on the same machine
> as the Master.
> >
> > Assume the Worker (residing in an executable called
> "Worker") looks like
> > this:
> >
> > int main(int argc, char** argv) {
> > MPI_Comm Master;
> >
> > MPI_Init(&argc, &argv);
> > MPI_Comm_get_parent(&Master);
> > if (MDPBlackBoard == MPI_COMM_NULL) {
> > cerr << "No parent Master!" << endl;
> > return 1;
> > }
> >
> > int size;
> > MPI_Comm_remote_size(Master, &size);
> > if (size != 1) {
> > cerr << "Parent Master doesn't have size 1" << endl;
> > return 1;
> > }
> > // Test: send test message to Master
> > int test = 37;
> > MPI_Status s;
> > MPI_Send(&test, 1, MPI_INT, 0, TAG_TEST, Master);
> > // Rest of code
> > }
> >
> >
> > And the Master begins as
> >
> > int main(int argc, char** argv) {
> > MPI_Init(&argc, &argv);
> >
> > MPI_Comm workerComm;
> > MPI_Info ourInfo;
> > MPI_Info_create(&ourInfo);
> >
> > // Spawn Worker
> > MPI_Comm_spawn("Worker", MPI_ARGV_NULL, 1, ourInfo, 0,
> > MPI_COMM_SELF, &workerComm, MPI_ERRCODES_IGNORE);
> >
> > // Test: check test message from worker
> > for (;;) {
> > int flag = 0;
> > int result = MPI_Iprobe(0, TAG_TEST, workerComm, &flag, &s);
> > cout << "MPI_Iprobe: result is " << result << ",
> flag is " <<
> > flag << endl;
> > if (flag > 0)
> > break;
> > }
> >
> > int test;
> > MPI_Recv(&test, 1, MPI_INT, 0, TAG_TEST, workerComm, &s);
> > cout << "BlackBoard: Have received test, data is "
> << test << endl;
> > }
> >
> >
> > What happens when running this architecture (mpiexec -n 1
> Master) is that
> > the Master never leaves its for loop (probing for messages from the
> > Worker, flag and result equal 0 forever; according to my docs, flag
> > should become 1 when a message is available), even if I let
> it run for a
> > long time.
> >
> > However, when I remove the for loop in the Master and
> immediately proceed
> > to MPI_Recv() of the TAG_TEST message, all goes well (i.e.
> the message is
> > received by the master and both master and worker continue).
> >
> > What am I doing wrong or not understanding correctly?
> >
> > The message send/receive and probing works fine on this same machine
> > when two processes are started with mpiexec -n 2 (and thus
> have ranks 0
> > and 1 in the same MPI_COMM_WORLD) and MPI_COMM_WORLD is
> used everywhere.
> >
> >
> > Pieter
> >
>
>
More information about the mpich-discuss
mailing list