[MPICH] Changing the comm size at runtime

Rajeev Thakur thakur at mcs.anl.gov
Thu Apr 12 14:04:01 CDT 2007


This would also work. I think the slaves would need to call Accept in the
loop, not connect, because they become part of the communicator that the
master uses to call Accept.

Rajeev

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Patrick Gräbel
> Sent: Wednesday, April 11, 2007 3:09 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] Changing the comm size at runtime
> 
> 
> Here is the code, posted on a pastebin: 
> http://rafb.net/p/5L20tD24.html
> 
> The head comment explains how to to start this "experimental" code.
> 
> Meanwhile I succeeded in creating a "huge" intracomm object from
> incoming intercomm objects _without_ using MPI_Intercomm_create. The
> master's code looks like this:
> 
> -------------------------------------------
> intras[0] = MPI::COMM_WORLD.Dup();
> for(int i = 0; i < num_slaves; i++)
> {
>   inters[i] = intras[i].Accept(port,MPI::INFO_NULL,0);
>   intras[i + 1] = inters[i].Merge(false);
>   intras[i + 1].Send(&num_slaves,1,MPI::INT,i + 1,0);
> 
> }
> -------------------------------------------
> 
> The master uses the intracomm of the last merge to accept the next
> intercomm object. For each accepted slave the master reports 
> the number
> of slaves being awaited. The slaves themselves do something like this:
> 
> -------------------------------------------
> inters[0] = world.Connect(port.c_str(),MPI::INFO_NULL,0);
> intras[0] = inters[0].Merge(true);
> // receive the actual number of slaves being awaited
> intras[0].Recv(&num_slaves,1,MPI::INT,0,0);
> // lower ranks have to connect all higher ranks
> for(int i = 0; i < num_slaves - intras[0].Get_rank(); i++)
> {
>   inters[i + 1] = intras[i].Connect(port.c_str(),MPI::INFO_NULL,0);
>   intras[i + 1] = inters[i + 1].Merge(false);
> }
> -------------------------------------------
> 
> E.g. an Allgather call works over the arbitrary number of slaves
> including the master. After disconnection of all slaves the master is
> able to accept a new set of slaves.
> 
> I wonder if this solution is equivalent to your suggestion...
> 
> Greetings
> Patrick
> 
> Rajeev Thakur schrieb:
> > Can you send us the example code?
> > 
> > Rajeev 
> > 
> 
> 




More information about the mpich-discuss mailing list