[MPICH] Problem probing message from dynamically spawned process

Wed May 23 05:12:10 CDT 2007

Hello,

I'm using MPICH2 1.0.5 on Debian Etch AMD64 (mpd daemon). I'm trying to
implement a Master / Worker architecture, where the master can dynamically
spawn additional workers (using MPI_Comm_spawn).

Ultimately, I want the master to listen to its workers using a loop with
MPI_Iprobe statements to process incoming messages. However, when testing
my initial efforts, I have stumbled over a peculiar situation which
(seemingly) allows the Master to receive a worker's (test) message, but
cannot Iprobe for it.

In my testing, the spawned Workers run on the same machine as the Master.

Assume the Worker (residing in an executable called "Worker") looks like
this:

   int main(int argc, char** argv) {
    MPI_Comm Master;

    MPI_Init(&argc, &argv);
    MPI_Comm_get_parent(&Master);
    if (MDPBlackBoard == MPI_COMM_NULL) {
        cerr << "No parent Master!" << endl;
        return 1;
    }

    int size;
    MPI_Comm_remote_size(Master, &size);
    if (size != 1) {
        cerr << "Parent Master doesn't have size 1" << endl;
        return 1;
    }
    // Test: send test message to Master
    int test = 37;
    MPI_Status s;
    MPI_Send(&test, 1, MPI_INT, 0, TAG_TEST, Master);
    // Rest of code
   }

And the Master begins as

   int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

        MPI_Comm workerComm;    
    MPI_Info ourInfo;
    MPI_Info_create(&ourInfo);

    // Spawn Worker
    MPI_Comm_spawn("Worker", MPI_ARGV_NULL, 1, ourInfo, 0,
MPI_COMM_SELF, &workerComm, MPI_ERRCODES_IGNORE);

    // Test: check test message from worker
    for (;;) {
        int flag = 0;
        int result = MPI_Iprobe(0, TAG_TEST, workerComm, &flag, &s);
        cout <<    "MPI_Iprobe: result is " << result << ", flag is " <<
flag << endl;
        if (flag > 0)
           break;
    }

    int test;    
    MPI_Recv(&test, 1, MPI_INT, 0, TAG_TEST, workerComm, &s);
    cout <<    "BlackBoard: Have received test, data is " << test << endl;
   }    

What happens when running this architecture (mpiexec -n 1 Master) is that
the Master never leaves its for loop (probing for messages from the
Worker, flag and result equal 0 forever; according to my docs, flag
should become 1 when a message is available), even if I let it run for a
long time.

However, when I remove the for loop in the Master and immediately proceed
to MPI_Recv() of the TAG_TEST message, all goes well (i.e. the message is
received by the master and both master and worker continue).

What am I doing wrong or not understanding correctly?

The message send/receive and probing works fine on this same machine
when two processes are started with mpiexec -n 2 (and thus have ranks 0
and 1 in the same MPI_COMM_WORLD) and MPI_COMM_WORLD is used everywhere.

Pieter