[MPICH] Changing the comm size at runtime
Rajeev Thakur
thakur at mcs.anl.gov
Thu Apr 12 14:04:01 CDT 2007
This would also work. I think the slaves would need to call Accept in the
loop, not connect, because they become part of the communicator that the
master uses to call Accept.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Patrick Gräbel
> Sent: Wednesday, April 11, 2007 3:09 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] Changing the comm size at runtime
>
>
> Here is the code, posted on a pastebin:
> http://rafb.net/p/5L20tD24.html
>
> The head comment explains how to to start this "experimental" code.
>
> Meanwhile I succeeded in creating a "huge" intracomm object from
> incoming intercomm objects _without_ using MPI_Intercomm_create. The
> master's code looks like this:
>
> -------------------------------------------
> intras[0] = MPI::COMM_WORLD.Dup();
> for(int i = 0; i < num_slaves; i++)
> {
> inters[i] = intras[i].Accept(port,MPI::INFO_NULL,0);
> intras[i + 1] = inters[i].Merge(false);
> intras[i + 1].Send(&num_slaves,1,MPI::INT,i + 1,0);
>
> }
> -------------------------------------------
>
> The master uses the intracomm of the last merge to accept the next
> intercomm object. For each accepted slave the master reports
> the number
> of slaves being awaited. The slaves themselves do something like this:
>
> -------------------------------------------
> inters[0] = world.Connect(port.c_str(),MPI::INFO_NULL,0);
> intras[0] = inters[0].Merge(true);
> // receive the actual number of slaves being awaited
> intras[0].Recv(&num_slaves,1,MPI::INT,0,0);
> // lower ranks have to connect all higher ranks
> for(int i = 0; i < num_slaves - intras[0].Get_rank(); i++)
> {
> inters[i + 1] = intras[i].Connect(port.c_str(),MPI::INFO_NULL,0);
> intras[i + 1] = inters[i + 1].Merge(false);
> }
> -------------------------------------------
>
> E.g. an Allgather call works over the arbitrary number of slaves
> including the master. After disconnection of all slaves the master is
> able to accept a new set of slaves.
>
> I wonder if this solution is equivalent to your suggestion...
>
> Greetings
> Patrick
>
> Rajeev Thakur schrieb:
> > Can you send us the example code?
> >
> > Rajeev
> >
>
>
More information about the mpich-discuss
mailing list