[MPICH2-dev] MPICH2 bugs in configure and maybe MPI_Cancel

Philippe Combes Philippe.Combes at ens-lyon.fr
Fri Nov 17 15:36:44 CST 2006


Hi,

Because of severe errors occuring inside the library, I had to configure 
MPICH2 1.0.4 with the following options, in order to try and find out 
whre such errors were coming from.

./configure --with-device=ch3:ssm --without-mpe --disable-f77 
--disable-f90 --enable-cxx --enable-fast --enable-sharedlibs=gcc 
--enable-strict --enable-error-checking=all --enable-error-messages=all 
--enable-g=all --enable-debuginfo -prefix=/usr/local/mpich_dbg

NB: Before that, I changed all -g -O2 I found in all configure scripts 
to -g only, because it is impossible to debug step by step with -O2.

I run it on a Debian/GNU Linux x86 unstable, with gcc/g++ 4.1 and kernel 
2.6.17

First, after the configure runs, I still have a "#undef 
HAVE_ERROR_CHECKING" in mpichconf.h, and MPICH_ERROR_MSG_LEVEL was set 
to MPICH_ERROR_MSG_NONE. Strange, isn't it ?

Then, shared library are not generated at all, although it is a simple 
case of hardware/software that was supported by MPICH 1.2.7



Now let's talk about the initial errors.
Running my 3-node program under gdb leads to three SIGSEV. On two 
processes, I get :

   Program received signal SIGSEGV, Segmentation fault.
   0x08119466 in MPIDI_CH3I_Progress (is_blocking=0, state=0x0)
       at ch3_progress.c:121
   121             if (spin_count >= 
MPIDI_Process.my_pg->ch.nShmWaitSpinCount)
   Current language:  auto; currently c

(because MPIDI_Process.my_pg == 0)

On the third process, I get:

   Program received signal SIGSEGV, Segmentation fault.
   0x080f8bbf in PMPI_Cancel (request=0x8203354) at cancel.c:92
   92          switch (request_ptr->kind)
   Current language:  auto; currently c

(because request_ptr == 0, because *request is "invalid handle" -- 
0x2c000000)

The request I try to cancel is a classical send request.



If I fix myself the mpichconf.h file to have _really_ error checking and 
messages enabled, then I get on all processes:

   Error encountered before initializing MPICH

   Program exited with code 01.

Poor me, I have no more clue about why my send request is invalid... 
Furthermore, it says "before intializing" although these errors occur AT 
THE END of the execution and all communications were OK before.
Could you please help me ?

Thanks in advance,

Philippe






More information about the mpich2-dev mailing list