[mpich-discuss] messages queue

Fri Jul 11 03:16:38 CDT 2008

Dnia 10-07-2008, Cz o godzinie 14:16 -0500, Darius Buntinas pisze:
> The limits you are seeing (67msgs, 2K-4K msgs and 18 msgs) are due to 
> the available buffers on unreceived sends.  What happens is that in a 
> blocking send, mpich2 tries to send the message, either over a shared 
> memory queue (for nemesis and shm) or a tcp socket (for sock).  If it 
> can't send the message completely, it waits until it can.  A message 
> can't be sent if the shared memory queue is full (for nemesis or shm) or 
> if the tcp buffers are full (for sock).  And the queues or buffers fill 
> up when the receiver is not receiving the messages fast enough.
> 
> It seems like what's happening in your application is that the receiver 
> processes are not calling an mpi call often enough to allow mpich2 to 
> pull incoming messages off the queue or read messages from the socket. 
> So once the socket buffers or queues fill up, the sender will block in 
> mpi_send() until the receiver calls an mpi call and the messages are 
> received.  Note that even if the application doesn't post receives for 
> the messages, MPICH2 will still receive them and buffer them internally 
> as unexpected messages.  The solution is to call an mpi function, like a 
> send or receive function, or probe or iprobe from time to time to allow 
> mpich2 to make progress and drain the incoming messages.  Generally 
> people use iprobe (because it's non-blocking and doesn't send or receive 
> anything) to "poke" the mpi library and allow it to make progress.

I understand this behaviour, what's more, I expected something similar,
however I thought the internal buffer is much more bigger or at least it
can be enlargement. My application is working with video frames (~1MB).
I thought it can be possible to send a few messages (frames) to a queue
and receive it at once (sometimes sender is much more faster then
receiver which is very busy and is not able to receive all messages in
time). 

I assume, that application of non-blocking sending instead of MPI_Send()
resolve this problem. Is it true?

Is the internal MPICH2 buffer fixed in size and cannot be enlargement?

> You can try to restructure your application so that either the receiver 
> is calling an mpi function from time to time, or try using non-blocking 
> sends then call wait on all of them at once, or even make all of your 
> sends and receives non-blocking then call wait on everything.  Another 
> option is to create a "progress" thread which makes mpi calls to allow 
> the library to make progress. E.g.:
> 
> The progress thread would do something like this:
> 
> prog_thread_func() {
>    MPI_Recv(NULL, 0, MPI_INT, 0, DONE_TAG, MPI_COMM_SELF,
>             MPI_STATUS_IGNORE);
>    return; /* thread exits */
> }
> 
> The main thread would start the progress thread at the beginning, then 
> send a message to it just before joining with the progress thread:
> 
> MPI_Send(NULL, 0, MPI_INT, 0, DONE_TAG, MPI_COMM_SELF);
> pthread_join(...);
> 
> Note that the nemesis channel does busy polling, meaning that it is 
> actively checking for incoming messages, even when it's in a blocking 
> mpi call.  This can have a performance impact if you have more threads 
> than processors since this progress thread will be stealing cycles from 
> the other threads doing real work.  The sock channel doesn't have this 
> issue.

I was wonder why MPI_Waitany() with nemesis channel took 100% CPU... I
thought it's an error, now I see it's by design. That is why change
MPI_Waitany() with MPI_Testany(); usleep(a few us).

Your explanation helps me a lot. Thanks!

Jarek!

> 
> Hope this helps,
> -d
> 
> 
> On 07/07/2008 02:10 PM, Jarosław Bułat wrote:
> > On Mon, 2008-07-07 at 12:50 -0500, Rajeev Thakur wrote:
> >>>> What is happening is a flow control problem, and the above 
> >>> are ways to get
> >>>> around it.
> >>> Is it problem of MPICH library or my implementation of this library?
> >> One can blame it on the implementation, but an application can help by not
> >> sending too many messages without receiving them.
> > 
> > I've resolve problem partially. I've found an error in my code: the
> > MPI_Send and MPI_Testany+MPI_Irecv were placed in different threads and
> > MPI was initialized by means of MPI_Init. Replacing MPI_Init with
> > MPI_Init_thread resolve problem with crashing ch3:nemesis. In case the
> > queue is ,,full'' system is waiting until receiver process at least one
> > message, however, the queue isn't long enough for my application. 
> > 
> > Is it any chance to increase memory for queue (enlarge the number of
> > messages that could be stored in the queue)?
> > 
> > 
> > Regards,
> > Jarek!
> >  
> > 
> > 
> > 
> > 
> >