[MPICH2-dev] reference counting communicators
William Gropp
gropp at mcs.anl.gov
Mon Dec 5 14:45:01 CST 2005
At 03:34 PM 11/29/2005, David Gingold wrote:
>I've been puzzling over this one this afternoon:
>
>Is it a programming error to call MPI_Comm_free() when there are
>outstanding unmatched sends that reference the communicator?
No, it is not an error if you are talking about sends initiated on the same
process as on the one that is freeing the communicator. It is if the sends
are initiated by another process, targeted at the process that is now
freeing the communicator.
>The MPI spec seems to say that a communicator is not internally
>deallocated until these sends are completed at the receive end. But
>how is the implementation to know? The processes could try to check
>their early send queues (or increment reference counts as early sends
>arrive), but with an eager message implementation it's possible that
>the unmatched sends could be in flight, not accounted for by the
>sender or the receiver.
The communicator may be freed once all of the active requests complete on
the process that initiated them. Thus, if an eager nonblocking send
delivers the data to the destination, then that request can now go inactive
and once freed (with a completion call or MPI_Request_free), the
communicator's reference count can be decremented. It isn't necessary for
the receiver to receive that message before the sender can free the
communicator structure. That's what the MPICH2 code should be doing; if it
isn't that's a bug.
>(And if MPI deallocates and later re-uses the communicator before the
>sends are matched, we can construe a scenario where a receive
>mistakenly matches a send from the old communicator.)
Yes, you do have to be careful about this, particularly when combined with
cancel of isends.
>I believe this has implications for how MPICH2 must reference count
>communicators in send and receive requests. If the MPI
>implementation doesn't need to account for these stranded sends, then
>we might avoid creating references to the communicator in some cases.
Unfortunately, it does need to account for these, at least in the
nonblocking operations. In the case of blocking sends, it isn't necessary
to update the reference count. MPICH2 should be careful updating these
reference counts; if there is a problem, we'll fix it.
Bill
>-dg
>
>--
>David Gingold
>Principal Software Engineer
>SiCortex
>One Clock Tower Place, Suite 100
>Maynard MA 01754
>(978) 897-0214 x224
>
>
William Gropp
http://www.mcs.anl.gov/~gropp
More information about the mpich2-dev
mailing list