[MPICH] collective abort of all ranks

Kamaraju Kusumanchi kamaraju at gmail.com
Sat Jun 9 22:11:50 CDT 2007


Hi all,

I would appreciate if someone can help me with the following code. I
am having trouble in understanding why the parallel version is giving
errors while the serial version of it is working fine. Also, what does
"collective abort of all ranks" mean?

Consider the test.f90 attached in this email.

$mpif90 test.f90

compiles fine. However, when I run it gives the following error.

$mpiexec -l -n 4 ./a.out
rank 3 in job 282  node1.jit.mae.cornell.edu_33436   caused collective
abort ofall ranks
 exit status of rank 3: killed by signal 11
rank 1 in job 282  node1.jit.mae.cornell.edu_33436   caused collective
abort ofall ranks
 exit status of rank 1: killed by signal 11

Do you think this is a memory issue? But the size of the arrays
involved is pretty small 600x600 double precision real arrays.
Moreover the code does not give any errors if I comment out all the
MPI statements and just run it using f90.

$mpif90 -show
f90 -I/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_absoft_8.0/include
-p/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_absoft_8.0/include
-L/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_absoft_8.0/lib
-lmpichf90 -lmpichf90 -lmpich -lpthread -lrt

Here f90 is absoft 8.0 fortran 90 compiler. I am using mpich2 1.0.5p4,
compiled with absoft fortran compiler 8.0, gcc 4.3.0.

$mpdtrace -l
node1.jit.mae.cornell.edu_33436 (192.168.1.1)
node3.jit.mae.cornell.edu_55240 (192.168.1.3)
node2.jit.mae.cornell.edu_54930 (192.168.1.2)
node4.jit.mae.cornell.edu_46424 (192.168.1.4)

$gcc -v
Using built-in specs.
Target: i386-pc-linux-gnu
Configured with: /home/fxcoudert/gfortran_nightbuild/trunk/configure
--prefix=/home/fxcoudert/gfortran_nightbuild/irun-20070123
--enable-languages=c,fortran --host=i386-pc-linux-gnu
--with-gmp=/home/fxcoudert/gfortran_nightbuild/software
Thread model: posix
gcc version 4.3.0 20070123 (experimental)


Can someone, please clarify what I am doing wrong? Any
suggestions/comments to overcome this problem are most welcome.

thanks
raju
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.f90
Type: application/octet-stream
Size: 1244 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070609/5d1d32a7/attachment.obj>


More information about the mpich-discuss mailing list