[mpich-discuss] Using MPI_Comm_split to MPI_COMM_LOCAL

Dave Goodell goodell at mcs.anl.gov
Wed Nov 10 11:20:52 CST 2010


On Nov 8, 2010, at 4:47 PM CST, John Bent wrote:

> We'd like to create an MPI Communicator for just the processes on each
> local node (i.e. something like MPI_COMM_LOCAL).  We were doing this
> previously very naively by having everyone send out their hostnames and
> then doing string parsing.  We realize that a much simpler way to do it
> would be to use MPI_Comm_split to split MPI_COMM_WORLD by the IP
> address.  Unfortunately, the IP address is 64 bits and the max "color"
> to pass to MPI_Comm_split is only 2^16.  So we're currently planning on
> splitting iteratively on each 16 bits in the 64 bit IP address.

Hi John,

Are your IP addresses really 64 bits?  IPv4 addresses are 32-bit and (AFAIK) full IPv6 addresses are 128-bit.  If you have IPv6 then maybe you could just use the low order 64-bits for most HPC MPI scenarios, but I'm not overly knowledgeable about IPv6...

Also, as I read the MPI-2.2 standard, the only restriction on color values is that it is a non-negative integer.  So FWIW you really have 2^31 values available on most platforms.

> Anyone know a better way to achieve MPI_COMM_LOCAL?  Or can
> MPI_Comm_split be enhanced to take a 64 bit color?

For MPICH2 we could conceivably add an extension (MPIX_Comm_split64 or whatever) that took a longer (perhaps arbitrary length) color.  Also, the MPI Forum could provide this sort of capability in future versions of the MPI standard.  But there's nothing that can be done to MPI_Comm_split itself in the short term.

w.r.t. the end goal of MPI_COMM_LOCAL: In MPICH2 we usually create this communicator anyway, so we also could probably expose this communicator directly in an easy fashion with some sort of extension.

Otherwise, your iterative solution seems very reasonable, especially as a fix entirely outside of the MPI library.  Alternatively you could re-implement MPI_Comm_split at the user level with communication calls and MPI_Comm_create.  This could be a bit more efficient if you take the time to do it right.

-Dave



More information about the mpich-discuss mailing list