[MPICH2-dev] reference counting communicators

Tue Dec 6 13:55:48 CST 2005

Bill --

If I understand your response correctly, it would be an error for the 
receiving process to call MPI_Comm_free() when there are outstanding 
unmatched sends to that receiver sent using the communicator.  But it's 
okay for the sending process to call MPI_Comm_free() at that point.

What I'm trying to accomplish is a bit of code-bumming in my MPID 
implementation: When I create a request, if possible I'd like to avoid 
creating a reference to the communicator in the MPID_Request structure. 
  For a send request, I believe I may do without the reference a long as 
the sender doesn't need to refer to the communicator in order to 
complete (and possibly cancel) the request.  A receive request would 
need to be careful to reference the communicator until there is a 
match, at least.

Thanks for the help.  (And please correct me if I misunderstand this 
still.)

-dg

On Dec 5, 2005, at 3:45 PM, William Gropp wrote:

> At 03:34 PM 11/29/2005, David Gingold wrote:
>> I've been puzzling over this one this afternoon:
>>
>> Is it a programming error to call MPI_Comm_free() when there are
>> outstanding unmatched sends that reference the communicator?
>
> No, it is not an error if you are talking about sends initiated on the 
> same process as on the one that is freeing the communicator.  It is if 
> the sends are initiated by another process, targeted at the process 
> that is now freeing the communicator.
>
>
>> The MPI spec seems to say that a communicator is not internally
>> deallocated until these sends are completed at the receive end.  But
>> how is the implementation to know?  The processes could try to check
>> their early send queues (or increment reference counts as early sends
>> arrive), but with an eager message implementation it's possible that
>> the unmatched sends could be in flight, not accounted for by the
>> sender or the receiver.
>
> The communicator may be freed once all of the active requests complete 
> on the process that initiated them.  Thus, if an eager nonblocking 
> send delivers the data to the destination, then that request can now 
> go inactive and once freed (with a completion call or 
> MPI_Request_free), the communicator's reference count can be 
> decremented.  It isn't necessary for the receiver to receive that 
> message before the sender can free the communicator structure.  That's 
> what the MPICH2 code should be doing; if it isn't that's a bug.
>
>
>
>> (And if MPI deallocates and later re-uses the communicator before the
>> sends are matched, we can construe a scenario where a receive
>> mistakenly matches a send from the old communicator.)
>
> Yes, you do have to be careful about this, particularly when combined 
> with cancel of isends.
>
>
>> I believe this has implications for how MPICH2 must reference count
>> communicators in send and receive requests.  If the MPI
>> implementation doesn't need to account for these stranded sends, then
>> we might avoid creating references to the communicator in some cases.
>
> Unfortunately, it does need to account for these, at least in the 
> nonblocking operations.  In the case of blocking sends, it isn't 
> necessary to update the reference count.  MPICH2 should be careful 
> updating these reference counts; if there is a problem, we'll fix it.
>
> Bill
>
>
>> -dg
>>
>> --
>> David Gingold
>> Principal Software Engineer
>> SiCortex
>> One Clock Tower Place, Suite 100
>> Maynard MA 01754
>> (978) 897-0214 x224
>>
>>
>
> William Gropp
> http://www.mcs.anl.gov/~gropp