[MPICH] collective abort of all ranks

Anthony Chan chan at mcs.anl.gov
Sat Jun 9 23:15:28 CDT 2007



On Sat, 9 Jun 2007, Kamaraju Kusumanchi wrote:

> Hi all,
>
> I would appreciate if someone can help me with the following code. I
> am having trouble in understanding why the parallel version is giving
> errors while the serial version of it is working fine. Also, what does
> "collective abort of all ranks" mean?

all processes in your MPI job abort collectively.

>
> Consider the test.f90 attached in this email.
>
> $mpif90 test.f90
>
> compiles fine. However, when I run it gives the following error.
>
> $mpiexec -l -n 4 ./a.out
> rank 3 in job 282  node1.jit.mae.cornell.edu_33436   caused collective
> abort ofall ranks
>  exit status of rank 3: killed by signal 11
> rank 1 in job 282  node1.jit.mae.cornell.edu_33436   caused collective
> abort ofall ranks
>  exit status of rank 1: killed by signal 11

Signal 11 is segmentation fault.  The most likely reason is that your
program is accessing invalid memory.  Recompile your code with -g
(and recompile your mpich2-1.0.5p4 with --enable-g=meminit,dbg if
possible), then rerun your code with gdb, ddd, or valgrind.

> Do you think this is a memory issue? But the size of the arrays
> involved is pretty small 600x600 double precision real arrays.
> Moreover the code does not give any errors if I comment out all the
> MPI statements and just run it using f90.

It is possible that some of the arguments in the MPI calls are accessing
invalid memory...

>
> $mpif90 -show
> f90 -I/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_absoft_8.0/include
> -p/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_absoft_8.0/include
> -L/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_absoft_8.0/lib
> -lmpichf90 -lmpichf90 -lmpich -lpthread -lrt
>
> Here f90 is absoft 8.0 fortran 90 compiler. I am using mpich2 1.0.5p4,
> compiled with absoft fortran compiler 8.0, gcc 4.3.0.

If possbile, you may want to use gcc 4.2 instead of the experimental gcc
4.3, just in case 4.3 is buggy...

A.Chan




More information about the mpich-discuss mailing list