[MPICH2-dev] reference counting communicators

William Gropp gropp at mcs.anl.gov
Mon Dec 5 14:45:01 CST 2005


At 03:34 PM 11/29/2005, David Gingold wrote:
>I've been puzzling over this one this afternoon:
>
>Is it a programming error to call MPI_Comm_free() when there are
>outstanding unmatched sends that reference the communicator?

No, it is not an error if you are talking about sends initiated on the same 
process as on the one that is freeing the communicator.  It is if the sends 
are initiated by another process, targeted at the process that is now 
freeing the communicator.


>The MPI spec seems to say that a communicator is not internally
>deallocated until these sends are completed at the receive end.  But
>how is the implementation to know?  The processes could try to check
>their early send queues (or increment reference counts as early sends
>arrive), but with an eager message implementation it's possible that
>the unmatched sends could be in flight, not accounted for by the
>sender or the receiver.

The communicator may be freed once all of the active requests complete on 
the process that initiated them.  Thus, if an eager nonblocking send 
delivers the data to the destination, then that request can now go inactive 
and once freed (with a completion call or MPI_Request_free), the 
communicator's reference count can be decremented.  It isn't necessary for 
the receiver to receive that message before the sender can free the 
communicator structure.  That's what the MPICH2 code should be doing; if it 
isn't that's a bug.



>(And if MPI deallocates and later re-uses the communicator before the
>sends are matched, we can construe a scenario where a receive
>mistakenly matches a send from the old communicator.)

Yes, you do have to be careful about this, particularly when combined with 
cancel of isends.


>I believe this has implications for how MPICH2 must reference count
>communicators in send and receive requests.  If the MPI
>implementation doesn't need to account for these stranded sends, then
>we might avoid creating references to the communicator in some cases.

Unfortunately, it does need to account for these, at least in the 
nonblocking operations.  In the case of blocking sends, it isn't necessary 
to update the reference count.  MPICH2 should be careful updating these 
reference counts; if there is a problem, we'll fix it.

Bill


>-dg
>
>--
>David Gingold
>Principal Software Engineer
>SiCortex
>One Clock Tower Place, Suite 100
>Maynard MA 01754
>(978) 897-0214 x224
>
>

William Gropp
http://www.mcs.anl.gov/~gropp 




More information about the mpich2-dev mailing list