[MPICH] collective abort of all ranks

Kamaraju Kusumanchi kamaraju at gmail.com
Sun Jun 10 18:35:44 CDT 2007


> >
> > $mpif90 -show
> > f90 -I/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_absoft_8.0/include
> > -p/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_absoft_8.0/include
> > -L/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_absoft_8.0/lib
> > -lmpichf90 -lmpichf90 -lmpich -lpthread -lrt
> >
> > Here f90 is absoft 8.0 fortran 90 compiler. I am using mpich2 1.0.5p4,
> > compiled with absoft fortran compiler 8.0, gcc 4.3.0.
>
> If possbile, you may want to use gcc 4.2 instead of the experimental gcc
> 4.3, just in case 4.3 is buggy...
>


Just an update.

The problem is reproducible even when mpich2 is compiled with gcc 4.2,
absoft 8.0.

$mpif90 -show
f90 -I/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.2.0_absoft_8.0/include
-p/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.2.0_absoft_8.0/include
-L/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.2.0_absoft_8.0/lib
-lmpichf90 -lmpichf90 -lmpich -lpthread -lrt

$gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: /home/raju/software/unZipped/gcc-4.2.0/configure
--prefix=/home/raju/software/compiledSoftware/gcc_4.2.0_20070514
Thread model: posix
gcc version 4.2.0

$mpif90 test.f90

$mpiexec -l -n 4 ./a.out
rank 3 in job 7  node1.jit.mae.cornell.edu_38952   caused collective
abort of all ranks
  exit status of rank 3: killed by signal 11
rank 1 in job 7  node1.jit.mae.cornell.edu_38952   caused collective
abort of all ranks
  exit status of rank 1: killed by signal 11

When I ran the program via gdb using

mpiexec -gdb -l -n 4 ./a.out

the code seems to segfault at line 19 when it calls func1. AFAIK there
is no memory violation in that part.

I will try to recompile the mpich2 libs with --enable-g=meminit,dbg
and see if there is anything new to report.

raju




More information about the mpich-discuss mailing list