[mpich-discuss] messages queue

Thu Jul 10 14:16:10 CDT 2008

The limits you are seeing (67msgs, 2K-4K msgs and 18 msgs) are due to 
the available buffers on unreceived sends.  What happens is that in a 
blocking send, mpich2 tries to send the message, either over a shared 
memory queue (for nemesis and shm) or a tcp socket (for sock).  If it 
can't send the message completely, it waits until it can.  A message 
can't be sent if the shared memory queue is full (for nemesis or shm) or 
if the tcp buffers are full (for sock).  And the queues or buffers fill 
up when the receiver is not receiving the messages fast enough.

It seems like what's happening in your application is that the receiver 
processes are not calling an mpi call often enough to allow mpich2 to 
pull incoming messages off the queue or read messages from the socket. 
So once the socket buffers or queues fill up, the sender will block in 
mpi_send() until the receiver calls an mpi call and the messages are 
received.  Note that even if the application doesn't post receives for 
the messages, MPICH2 will still receive them and buffer them internally 
as unexpected messages.  The solution is to call an mpi function, like a 
send or receive function, or probe or iprobe from time to time to allow 
mpich2 to make progress and drain the incoming messages.  Generally 
people use iprobe (because it's non-blocking and doesn't send or receive 
anything) to "poke" the mpi library and allow it to make progress.

You can try to restructure your application so that either the receiver 
is calling an mpi function from time to time, or try using non-blocking 
sends then call wait on all of them at once, or even make all of your 
sends and receives non-blocking then call wait on everything.  Another 
option is to create a "progress" thread which makes mpi calls to allow 
the library to make progress. E.g.:

The progress thread would do something like this:

prog_thread_func() {
   MPI_Recv(NULL, 0, MPI_INT, 0, DONE_TAG, MPI_COMM_SELF,
            MPI_STATUS_IGNORE);
   return; /* thread exits */
}

The main thread would start the progress thread at the beginning, then 
send a message to it just before joining with the progress thread:

MPI_Send(NULL, 0, MPI_INT, 0, DONE_TAG, MPI_COMM_SELF);
pthread_join(...);

Note that the nemesis channel does busy polling, meaning that it is 
actively checking for incoming messages, even when it's in a blocking 
mpi call.  This can have a performance impact if you have more threads 
than processors since this progress thread will be stealing cycles from 
the other threads doing real work.  The sock channel doesn't have this 
issue.

Hope this helps,
-d

On 07/07/2008 02:10 PM, Jarosław Bułat wrote:
> On Mon, 2008-07-07 at 12:50 -0500, Rajeev Thakur wrote:
>>>> What is happening is a flow control problem, and the above 
>>> are ways to get
>>>> around it.
>>> Is it problem of MPICH library or my implementation of this library?
>> One can blame it on the implementation, but an application can help by not
>> sending too many messages without receiving them.
> 
> I've resolve problem partially. I've found an error in my code: the
> MPI_Send and MPI_Testany+MPI_Irecv were placed in different threads and
> MPI was initialized by means of MPI_Init. Replacing MPI_Init with
> MPI_Init_thread resolve problem with crashing ch3:nemesis. In case the
> queue is ,,full'' system is waiting until receiver process at least one
> message, however, the queue isn't long enough for my application. 
> 
> Is it any chance to increase memory for queue (enlarge the number of
> messages that could be stored in the queue)?
> 
> 
> Regards,
> Jarek!
>  
> 
> 
> 
> 
>