[mpich-discuss] messages queue
Darius Buntinas
buntinas at mcs.anl.gov
Thu Jul 10 14:16:10 CDT 2008
The limits you are seeing (67msgs, 2K-4K msgs and 18 msgs) are due to
the available buffers on unreceived sends. What happens is that in a
blocking send, mpich2 tries to send the message, either over a shared
memory queue (for nemesis and shm) or a tcp socket (for sock). If it
can't send the message completely, it waits until it can. A message
can't be sent if the shared memory queue is full (for nemesis or shm) or
if the tcp buffers are full (for sock). And the queues or buffers fill
up when the receiver is not receiving the messages fast enough.
It seems like what's happening in your application is that the receiver
processes are not calling an mpi call often enough to allow mpich2 to
pull incoming messages off the queue or read messages from the socket.
So once the socket buffers or queues fill up, the sender will block in
mpi_send() until the receiver calls an mpi call and the messages are
received. Note that even if the application doesn't post receives for
the messages, MPICH2 will still receive them and buffer them internally
as unexpected messages. The solution is to call an mpi function, like a
send or receive function, or probe or iprobe from time to time to allow
mpich2 to make progress and drain the incoming messages. Generally
people use iprobe (because it's non-blocking and doesn't send or receive
anything) to "poke" the mpi library and allow it to make progress.
You can try to restructure your application so that either the receiver
is calling an mpi function from time to time, or try using non-blocking
sends then call wait on all of them at once, or even make all of your
sends and receives non-blocking then call wait on everything. Another
option is to create a "progress" thread which makes mpi calls to allow
the library to make progress. E.g.:
The progress thread would do something like this:
prog_thread_func() {
MPI_Recv(NULL, 0, MPI_INT, 0, DONE_TAG, MPI_COMM_SELF,
MPI_STATUS_IGNORE);
return; /* thread exits */
}
The main thread would start the progress thread at the beginning, then
send a message to it just before joining with the progress thread:
MPI_Send(NULL, 0, MPI_INT, 0, DONE_TAG, MPI_COMM_SELF);
pthread_join(...);
Note that the nemesis channel does busy polling, meaning that it is
actively checking for incoming messages, even when it's in a blocking
mpi call. This can have a performance impact if you have more threads
than processors since this progress thread will be stealing cycles from
the other threads doing real work. The sock channel doesn't have this
issue.
Hope this helps,
-d
On 07/07/2008 02:10 PM, Jarosław Bułat wrote:
> On Mon, 2008-07-07 at 12:50 -0500, Rajeev Thakur wrote:
>>>> What is happening is a flow control problem, and the above
>>> are ways to get
>>>> around it.
>>> Is it problem of MPICH library or my implementation of this library?
>> One can blame it on the implementation, but an application can help by not
>> sending too many messages without receiving them.
>
> I've resolve problem partially. I've found an error in my code: the
> MPI_Send and MPI_Testany+MPI_Irecv were placed in different threads and
> MPI was initialized by means of MPI_Init. Replacing MPI_Init with
> MPI_Init_thread resolve problem with crashing ch3:nemesis. In case the
> queue is ,,full'' system is waiting until receiver process at least one
> message, however, the queue isn't long enough for my application.
>
> Is it any chance to increase memory for queue (enlarge the number of
> messages that could be stored in the queue)?
>
>
> Regards,
> Jarek!
>
>
>
>
>
>
More information about the mpich-discuss
mailing list