[mpich-discuss] Re: MPI_Brecv vs multiple MPI_Irecv

Wed Aug 27 14:01:54 CDT 2008

On 08/27/2008 01:00 PM, Robert Kubrick wrote:
> 
> On Aug 27, 2008, at 1:29 PM, Darius Buntinas wrote:
> 
>>
>> I'm not sure what you mean by a queue on the receiving side.  If 
>> receives have been posted, incoming messages that match the posted 
>> receives will be received directly into the specified location.
> 
> Yes, but you have to post receives and keep track of each request 
> handler. That was the idea of my original question, one recv per message.
> You can receive more than one element/message with each call of course:
> MPI_Irecv(buf, 10, ...)
> but then the Irecv handler won't be ready until *all* the elements have 
> been received.
> 
> All I am saying is that it would be convenient to specify a receiving 
> buffer where the implementation can store messages without blocking 
> program flow on the send side and message transmission.
 >
>> Depending on the particular MPI implementation you're using, progress 
>> on the receives (i.e., taking them off the network, doing the 
>> matching, copying/receiving into the user buffer, etc.) may only 
>> happen while you're calling an MPI function call.  So if you're in a 
>> long compute loop, the MPI library might not be performing the 
>> receives.  But adding a user buffer wouldn't help that situation either.
> 
> The program might be on a blocking recv on a different comm, which could 
> allow progress on a different comm.
> Also my understanding is that the MPI standard does not restrict the 
> progress of MPI send/recv during MPI calls. Some MPI implementations are 
> multi-threaded.
> 
>>
>> Messages that are received which don't have matching posted receives 
>> will be "queued" waiting for the matching receives, and either 
>> buffered internally at the receiver (for small messages) or will 
>> "stall" at the sender (for large messages).  But I believe you're only 
>> concerned with the case where receives have been posted.
> 
> Both cases. By specifying a receiving buffer to handle incoming 
> messages, the application does not need to post recv to allow 
> transmission (until there is room left in the buffer of course).
> 
> Even for small messages the send might block when all the internal 
> receiver buffer space is gone. And what is the size of the internal 
> buffer anyway?

For mpich2, the internal buffer space is limited by available memory. 
For each unexpected small message (<=128K for ch3:sock) mpich2 does a 
malloc and receives the message into that buffer.  So even unexpected 
small messages shouldn't block program flow...but you'll eventually 
crash if you run out of memory.

Unexpected large (>128 for ch3:sock) messages are another story.  The 
send won't complete on the send side until the receive is matched at the 
receiver (and the receiver makes sufficient progress to fully receive 
the message).  So unexpected large messages can block program flow at 
the sender.  On the other hand you won't run out of memory buffering 
unexpected large messages.

If you are running into this case where unexpected large messages are 
blocking program flow, it should be sufficient to increase the 
small-large message threshold in the library, until you have no small 
messages.  This can hurt performance when the library does an extra copy 
of large amounts of data from the temporary buffers, though, but it 
probably depends on the specific application as to the degree it is 
affected.

I think I understand where you're coming from.  I think you're saying 
thatif the app knows that it'll receive a lot of messages faster than it 
can post receives, there should be a way to tell the library "just 
buffer X MB of unexpected messages, regardless of how big they are." 
This is a little different from my previous suggestion above, in that 
the previous suggestion will buffer an unlimited number of messages that 
are smaller than the threshold, whereas this suggestion would buffer 
messages of any size until the total size is larger than the threshold X.

Does either suggestion fit your scenario?

-d

>>
>> Does this help?
>>
>> -d
>>
>>
>> On 08/27/2008 11:03 AM, Robert Kubrick wrote:
>>> A buffered receive would allow the implementation to receive and 
>>> store messages when the application is busy doing something else, 
>>> like reading messages on a different comm. I now understand why a 
>>> Brecv is not in the standard and it makes perfect sense, but the 
>>> result is that on the sending you can control the size of a sending 
>>> "queue", on the receiving side you can not.
>>> On Aug 27, 2008, at 11:23 AM, Darius Buntinas wrote:
>>>>
>>>> Well, what would it mean to do a buffered receive?
>>>>
>>>> This?
>>>>   buf = malloc(BUF_SZ);
>>>>   MPI_Irecv(buf,...);
>>>>   MPI_Wait(...);
>>>>   memcpy(recv_ptr, buf, BUF_SZ);
>>>>
>>>> What would be the benefit?
>>>>
>>>> -d
>>>>
>>>> On 08/27/2008 10:13 AM, Robert Kubrick wrote:
>>>>> I just found out that the standard actually doesn't have an 
>>>>> MPI_Brecv call.
>>>>> Any reason why the recv can not buffer messages in a user-provided 
>>>>> memory space, as per MPI_Battach/MPI_Bsend?
>>>>> On Aug 26, 2008, at 4:17 PM, Robert Kubrick wrote:
>>>>>> From a performance point of view, which one is better:
>>>>>>
>>>>>> MPI_Battach(10*sizeof(MSG))
>>>>>> MPI_Brecv()
>>>>>>
>>>>>> or
>>>>>>
>>>>>> MPI_recv_init()
>>>>>> MPI_recv_init()
>>>>>> MPI_recv_init()
>>>>>> ... /* 10 recv handlers */
>>>>>> MPI_Start(all recv)
>>>>>> MPI_Waitany()
>>>>>>
>>>>>>
>>>>>> I understand MPI_Brecv will require an extra message copy, from 
>>>>>> the attached buffer to the MPI_Brecv() buffer. I'd like to know if 
>>>>>> there other differences between the two methods.
>>>>>>
>>>>>> Thanks,
>>>>>> Rob
>>>>
>>
>