[MPICH2-dev] MPICH2 bugs in configure and maybe MPI_Cancel
Philippe Combes
Philippe.Combes at ens-lyon.fr
Fri Nov 17 15:36:44 CST 2006
Hi,
Because of severe errors occuring inside the library, I had to configure
MPICH2 1.0.4 with the following options, in order to try and find out
whre such errors were coming from.
./configure --with-device=ch3:ssm --without-mpe --disable-f77
--disable-f90 --enable-cxx --enable-fast --enable-sharedlibs=gcc
--enable-strict --enable-error-checking=all --enable-error-messages=all
--enable-g=all --enable-debuginfo -prefix=/usr/local/mpich_dbg
NB: Before that, I changed all -g -O2 I found in all configure scripts
to -g only, because it is impossible to debug step by step with -O2.
I run it on a Debian/GNU Linux x86 unstable, with gcc/g++ 4.1 and kernel
2.6.17
First, after the configure runs, I still have a "#undef
HAVE_ERROR_CHECKING" in mpichconf.h, and MPICH_ERROR_MSG_LEVEL was set
to MPICH_ERROR_MSG_NONE. Strange, isn't it ?
Then, shared library are not generated at all, although it is a simple
case of hardware/software that was supported by MPICH 1.2.7
Now let's talk about the initial errors.
Running my 3-node program under gdb leads to three SIGSEV. On two
processes, I get :
Program received signal SIGSEGV, Segmentation fault.
0x08119466 in MPIDI_CH3I_Progress (is_blocking=0, state=0x0)
at ch3_progress.c:121
121 if (spin_count >=
MPIDI_Process.my_pg->ch.nShmWaitSpinCount)
Current language: auto; currently c
(because MPIDI_Process.my_pg == 0)
On the third process, I get:
Program received signal SIGSEGV, Segmentation fault.
0x080f8bbf in PMPI_Cancel (request=0x8203354) at cancel.c:92
92 switch (request_ptr->kind)
Current language: auto; currently c
(because request_ptr == 0, because *request is "invalid handle" --
0x2c000000)
The request I try to cancel is a classical send request.
If I fix myself the mpichconf.h file to have _really_ error checking and
messages enabled, then I get on all processes:
Error encountered before initializing MPICH
Program exited with code 01.
Poor me, I have no more clue about why my send request is invalid...
Furthermore, it says "before intializing" although these errors occur AT
THE END of the execution and all communications were OK before.
Could you please help me ?
Thanks in advance,
Philippe
More information about the mpich2-dev
mailing list