[mpich-discuss] messages queue

Jarosław Bułat kwant at agh.edu.pl
Mon Jul 7 06:52:32 CDT 2008


On Fri, 2008-07-04 at 11:18 -0500, Rajeev Thakur wrote:
> In case you replaced each send by isend + wait, I had meant post all the
> sends as isends and then wait for all of them in a single waitall. Not sure
> if it will help, but worth trying.

ok, I try this concept, however this involve a few modification in code.

> Another option is after every 20 or 30 sends, insert a barrier. If all
> processes are not participating, you could have the receiver send a small
> message and have the sender wait for it. 

Unfortunately it's not an option for me. Messages flow diagram of my
process is very complex and not uniform (many types of messages). So, it
will be very difficult to implement barriers.

> What is happening is a flow control problem, and the above are ways to get
> around it.

Is it problem of MPICH library or my implementation of this library?

Below an core examination is presented:

MPICH2 Version:        1.0.7
MPICH2 Release date:    Unknown, built on Mon Jul  7 12:25:36 CEST 2008
MPICH2 Device:        ch3:nemesis
MPICH2 configure:     --enable-sharedlibs=gcc -prefix=/usr/ --enable-cxx
--enable-g=dbg --enable-fast=none --with-device=ch3:nemesis
MPICH2 CC:     gcc  -g
MPICH2 CXX:     g++  -g
MPICH2 F77:     g77  -g
MPICH2 F90:        -g


gdb -c core.10450 debug_GUI 
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /usr/lib/libQtGui.so.4...done.
Loaded symbols for /usr/lib/libQtGui.so.4
Reading symbols from /usr/lib/liblog4cpp.so.5...done.
Loaded symbols for /usr/lib/liblog4cpp.so.5
Reading symbols from /usr/lib/libmpich.so.1.1...done.
Loaded symbols for /usr/lib/libmpich.so.1.1
Reading symbols from /usr/lib/libssl.so.0.9.8...done.
Loaded symbols for /usr/lib/libssl.so.0.9.8
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/lib/libQtCore.so.4...done.
Loaded symbols for /usr/lib/libQtCore.so.4
Reading symbols from /usr/lib/libaudio.so.2...done.
Loaded symbols for /usr/lib/libaudio.so.2
Reading symbols from /usr/lib/libpng12.so.0...done.
Loaded symbols for /usr/lib/libpng12.so.0
Reading symbols from /usr/lib/libSM.so.6...done.
Loaded symbols for /usr/lib/libSM.so.6
Reading symbols from /usr/lib/libICE.so.6...done.
Loaded symbols for /usr/lib/libICE.so.6
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libglib-2.0.so.0...done.
Loaded symbols for /usr/lib/libglib-2.0.so.0
Reading symbols from /usr/lib/libXi.so.6...done.
Loaded symbols for /usr/lib/libXi.so.6
Reading symbols from /usr/lib/libXrender.so.1...done.
Loaded symbols for /usr/lib/libXrender.so.1
Reading symbols from /usr/lib/libXrandr.so.2...done.
Loaded symbols for /usr/lib/libXrandr.so.2
Reading symbols from /usr/lib/libfreetype.so.6...done.
Loaded symbols for /usr/lib/libfreetype.so.6
Reading symbols from /usr/lib/libfontconfig.so.1...done.
Loaded symbols for /usr/lib/libfontconfig.so.1
Reading symbols from /usr/lib/libXext.so.6...done.
Loaded symbols for /usr/lib/libXext.so.6
Reading symbols from /usr/lib/libX11.so.6...done.
Loaded symbols for /usr/lib/libX11.so.6
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /usr/lib/libcrypto.so.0.9.8...done.
Loaded symbols for /usr/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib/libgthread-2.0.so.0...done.
Loaded symbols for /usr/lib/libgthread-2.0.so.0
Reading symbols from /usr/lib/libXt.so.6...done.
Loaded symbols for /usr/lib/libXt.so.6
Reading symbols from /usr/lib/libpcre.so.3...done.
Loaded symbols for /usr/lib/libpcre.so.3
Reading symbols from /lib/libselinux.so.1...done.
Loaded symbols for /lib/libselinux.so.1
Reading symbols from /usr/lib/libexpat.so.1...done.
Loaded symbols for /usr/lib/libexpat.so.1
Reading symbols from /usr/lib/libXau.so.6...done.
Loaded symbols for /usr/lib/libXau.so.6
Reading symbols from /usr/lib/libxcb-xlib.so.0...done.
Loaded symbols for /usr/lib/libxcb-xlib.so.0
Reading symbols from /usr/lib/libxcb.so.1...done.
Loaded symbols for /usr/lib/libxcb.so.1
Reading symbols from /usr/lib/libXdmcp.so.6...done.
Loaded symbols for /usr/lib/libXdmcp.so.6
Reading symbols from /usr/lib/gconv/UTF-16.so...done.
Loaded symbols for /usr/lib/gconv/UTF-16.so
Reading symbols from /usr/lib/libXfixes.so.3...done.
Loaded symbols for /usr/lib/libXfixes.so.3
Reading symbols from /usr/lib/libXcursor.so.1...done.
Loaded symbols for /usr/lib/libXcursor.so.1
Reading symbols from /usr/lib/libXinerama.so.1...done.
Loaded symbols for /usr/lib/libXinerama.so.1
Reading symbols
from /usr/lib/qt4/plugins/imageformats/libqsvg.so...done.
Loaded symbols for /usr/lib/qt4/plugins/imageformats/libqsvg.so
Reading symbols from /usr/lib/libQtSvg.so.4...done.
Loaded symbols for /usr/lib/libQtSvg.so.4
Reading symbols
from /usr/lib/qt4/plugins/imageformats/libqgif.so...done.
Loaded symbols for /usr/lib/qt4/plugins/imageformats/libqgif.so
Reading symbols
from /usr/lib/qt4/plugins/imageformats/libqico.so...done.
Loaded symbols for /usr/lib/qt4/plugins/imageformats/libqico.so
Reading symbols
from /usr/lib/qt4/plugins/imageformats/libqjpeg.so...done.
Loaded symbols for /usr/lib/qt4/plugins/imageformats/libqjpeg.so
Reading symbols from /usr/lib/libjpeg.so.62...done.
Loaded symbols for /usr/lib/libjpeg.so.62
Reading symbols
from /usr/lib/qt4/plugins/imageformats/libqmng.so...done.
Loaded symbols for /usr/lib/qt4/plugins/imageformats/libqmng.so
Reading symbols from /usr/lib/libmng.so.1...done.
Loaded symbols for /usr/lib/libmng.so.1
Reading symbols from /usr/lib/liblcms.so.1...done.
Loaded symbols for /usr/lib/liblcms.so.1
Reading symbols
from /usr/lib/qt4/plugins/imageformats/libqtiff.so...done.
Loaded symbols for /usr/lib/qt4/plugins/imageformats/libqtiff.so
Reading symbols from /usr/lib/libtiff.so.4...done.
Loaded symbols for /usr/lib/libtiff.so.4
Reading symbols
from /usr/lib/qt4/plugins/iconengines/libqsvgicon.so...done.
Loaded symbols for /usr/lib/qt4/plugins/iconengines/libqsvgicon.so
Reading symbols from /lib/libnss_compat.so.2...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Core was generated by `./debug_GUI'.
Program terminated with signal 11, Segmentation fault.
[New process 10450]
[New process 10459]
#0  0x00007f6206644351 in MPID_nem_mpich2_test_recv
(cell=0x7fff0f8fa1c0, in_fbox=0x7fff0f8fa1ec)
    at ../include/mpid_nem_inline.h:923
923        poll_fboxes (cell, goto fbox_l);
(gdb) 


Any ideas? 
I found a way to debug processes before they crash. This is very
repeatedly process so I'm able to trace the moment of crash by means of
gdb.


Regards,
Jarek! 


> Rajeev
> 
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov 
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Jaroslaw Bulat
> > Sent: Friday, July 04, 2008 10:23 AM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: RE: [mpich-discuss] messages queue
> > 
> > Changing MPI_Send to MPI_Isend + MPI_Wait does not change 
> > anything. The system is to complex to send a small test 
> > program, however, I'm working on it - I'm trying to extract a 
> > smal fragment of it.
> > 
> > I've tested a few configuration with the following results:
> > 
> > mpich2-1.0.7:
> > 1) ch3:nemesis
> > 2) ch3:sock
> > 3) ch3:shm
> > 
> > First configuration crash after 67 (every time) unprocessed 
> > messages, second configuration freeze after 2000-4000 
> > unprocessed messages and third after 18 (every time) 
> > unprocessed messages. 
> > Freeze means I wasn't be able to send another message, 
> > however I was able to receive and process queued messages and 
> > thus shorten queue and send another message which is expected 
> > behaviour.
> > 
> > The length of one message described above is 28 Bytes, 
> > however before this test (second phase) several other 
> > (longer, up to 1.2 MBytes) messages have been successfully 
> > send, received and processed (first phase).
> > 
> > configuration of MPI:
> > mpich2version
> > MPICH2 Version:        1.0.7
> > MPICH2 Release date:    Unknown, built on Fri Jul  4 15:06:00 
> > CEST 2008
> > MPICH2 Device:        ch3:shm
> > MPICH2 configure:     --enable-sharedlibs=gcc -prefix=/usr/ 
> > --enable-cxx
> > --with-device=ch3:shm
> > MPICH2 CC:     gcc  -O2
> > MPICH2 CXX:     g++  -O2
> > MPICH2 F77:     g77  -O2
> > 
> > Everything is working on the Ubuntu 8.04 with CoreDuo 
> > processor (2core).
> > MPICH as well as program was compile by means of gcc 4.2.3 
> > (Ubuntu 4.2.3-2-ubuntu7).
> > 
> > Any ideas? how can I test it more precisely? 
> > 
> > 
> > Regards,
> > Jarek !
> > 
> > 
> > On Thu, 2008-07-03 at 14:51 -0500, Rajeev Thakur wrote:
> > > A queue of 100 messages of 100 bytes is not too big. What 
> > happens if 
> > > you replace MPI_Send with MPI_Isend? Can you send us a 
> > small test program?
> > > 
> > > Rajeev
> > >  
> > > 
> > > > -----Original Message-----
> > > > From: owner-mpich-discuss at mcs.anl.gov 
> > > > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of 
> > Jaroslaw Bulat
> > > > Sent: Thursday, July 03, 2008 7:57 AM
> > > > To: mpich2
> > > > Subject: [mpich-discuss] messages queue
> > > > 
> > > > Hi All!
> > > > 
> > > > I found the following problem using MPICH2 (1.0.6 and 1.0.7 with 
> > > > sock and nemesis channel). There are 5 unique process which 
> > > > interchange messages by means of MPI_Send() and MPI_Irecv() + 
> > > > MPI_Waitany() or MPI_Testany(). Since the MPI_Send() doesn't wait 
> > > > until receiver receive message and process it, it is 
> > possible to see 
> > > > a queue of messages waiting for processing at the 
> > receiver.  In such 
> > > > a situation my sender proces is crashing during calling 
> > MPI_Send() 
> > > > function.  Queue of unprocessed messages is of the length of ~100 
> > > > messages of the length 100 Bytes each.
> > > > I cannot use MPI_Ssend() which resolve this problem 
> > because in this 
> > > > way my system is less responsive.
> > > > 
> > > > How can I control length of queue? 
> > > > Is it possible to allocate more memory for internal MPI buffer? 
> > > > 
> > > > 
> > > > Regards,
> > > > Jarek!
> > > > 
> > > > 
> > > > 
> > > > 
> > 
> > 




More information about the mpich-discuss mailing list