[mpich-discuss] messages queue

Darius Buntinas buntinas at mcs.anl.gov
Fri Jul 11 11:01:39 CDT 2008




> 
> I understand this behaviour, what's more, I expected something similar,
> however I thought the internal buffer is much more bigger or at least it
> can be enlargement. My application is working with video frames (~1MB).
> I thought it can be possible to send a few messages (frames) to a queue
> and receive it at once (sometimes sender is much more faster then
> receiver which is very busy and is not able to receive all messages in
> time). 
> 
> I assume, that application of non-blocking sending instead of MPI_Send()
> resolve this problem. Is it true?
> 
> Is the internal MPICH2 buffer fixed in size and cannot be enlargement?

When you're using the sock channel you need to increase the tcp buffers. 
  To do this you need to do this as root:
echo 262142 > /proc/sys/net/core/rmem_max
echo 262142 > /proc/sys/net/core/rmem_default

In nemesis, set MPID_NEM_NUM_CELLS in
src/mpid/ch3/channels/nemesis/nemesis/include/mpid_nem_datatypes.h
Note, however, that cells are 64KB each, so be careful how much you 
increase this by.

As I mentioned, I think a better solution would be to use MPI_Isend() if 
you can.  With Isend, you won't be using the tcp buffers or 
shared-memory queues which are a limited resource.  Instead the send 
queue will be stored in user memory, which is not limited (well, until 
you run out of memory :-) ).  Of course with Isend, you'll have to make 
sure you call MPI_Test or MPI_Wait to free the requests and so you know 
when the send buffer can be reused.

> 
> I was wonder why MPI_Waitany() with nemesis channel took 100% CPU... I
> thought it's an error, now I see it's by design. That is why change
> MPI_Waitany() with MPI_Testany(); usleep(a few us).
> 
> Your explanation helps me a lot. Thanks!
> 

Nemesis does call sched_yield() periodically to release the processor 
when it's idle, but the newer linux kernels use a new scheduler which 
effectively breaks sched_yield (the process yields the processor, but 
the scheduler immediately reschedules it until it uses it's entire 
timeslice...so what's the point of yielding?).  If you have root access, 
you can make the scheduler behave a little better with:

echo 1 > /proc/sys/kernel/sched_compat_yield

If sched_yield is working, top may still show 100% utilization when in 
mpi_wait, but only if there are no other processes/threads waiting for 
that processor.  If there is something else running on that processor, 
you should see the usage go down to almost 0.  When you have more 
processes/threads than processors, playing with processor binding can 
also help when the scheduler can't guess the correct assignments.


-d




More information about the mpich-discuss mailing list