[MPICH2-dev] reference counting communicators
David Gingold
david.gingold at sicortex.com
Tue Dec 6 13:55:48 CST 2005
Bill --
If I understand your response correctly, it would be an error for the
receiving process to call MPI_Comm_free() when there are outstanding
unmatched sends to that receiver sent using the communicator. But it's
okay for the sending process to call MPI_Comm_free() at that point.
What I'm trying to accomplish is a bit of code-bumming in my MPID
implementation: When I create a request, if possible I'd like to avoid
creating a reference to the communicator in the MPID_Request structure.
For a send request, I believe I may do without the reference a long as
the sender doesn't need to refer to the communicator in order to
complete (and possibly cancel) the request. A receive request would
need to be careful to reference the communicator until there is a
match, at least.
Thanks for the help. (And please correct me if I misunderstand this
still.)
-dg
On Dec 5, 2005, at 3:45 PM, William Gropp wrote:
> At 03:34 PM 11/29/2005, David Gingold wrote:
>> I've been puzzling over this one this afternoon:
>>
>> Is it a programming error to call MPI_Comm_free() when there are
>> outstanding unmatched sends that reference the communicator?
>
> No, it is not an error if you are talking about sends initiated on the
> same process as on the one that is freeing the communicator. It is if
> the sends are initiated by another process, targeted at the process
> that is now freeing the communicator.
>
>
>> The MPI spec seems to say that a communicator is not internally
>> deallocated until these sends are completed at the receive end. But
>> how is the implementation to know? The processes could try to check
>> their early send queues (or increment reference counts as early sends
>> arrive), but with an eager message implementation it's possible that
>> the unmatched sends could be in flight, not accounted for by the
>> sender or the receiver.
>
> The communicator may be freed once all of the active requests complete
> on the process that initiated them. Thus, if an eager nonblocking
> send delivers the data to the destination, then that request can now
> go inactive and once freed (with a completion call or
> MPI_Request_free), the communicator's reference count can be
> decremented. It isn't necessary for the receiver to receive that
> message before the sender can free the communicator structure. That's
> what the MPICH2 code should be doing; if it isn't that's a bug.
>
>
>
>> (And if MPI deallocates and later re-uses the communicator before the
>> sends are matched, we can construe a scenario where a receive
>> mistakenly matches a send from the old communicator.)
>
> Yes, you do have to be careful about this, particularly when combined
> with cancel of isends.
>
>
>> I believe this has implications for how MPICH2 must reference count
>> communicators in send and receive requests. If the MPI
>> implementation doesn't need to account for these stranded sends, then
>> we might avoid creating references to the communicator in some cases.
>
> Unfortunately, it does need to account for these, at least in the
> nonblocking operations. In the case of blocking sends, it isn't
> necessary to update the reference count. MPICH2 should be careful
> updating these reference counts; if there is a problem, we'll fix it.
>
> Bill
>
>
>> -dg
>>
>> --
>> David Gingold
>> Principal Software Engineer
>> SiCortex
>> One Clock Tower Place, Suite 100
>> Maynard MA 01754
>> (978) 897-0214 x224
>>
>>
>
> William Gropp
> http://www.mcs.anl.gov/~gropp
More information about the mpich2-dev
mailing list