[MPICH] collective abort of all ranks

Anthony Chan chan at mcs.anl.gov
Mon Jun 11 14:57:05 CDT 2007



On Sun, 10 Jun 2007, Kamaraju Kusumanchi wrote:

> On 6/10/07, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> > I am able to run your program with pgf90 and gcc 3.3.5 without any problem.
> >
> > Rajeev
> >
>
> Is pgf90 a free compiler?
>

I don't think pgf90, or portland group compiler, is free.

In terms of your problem.  I didn't know that you attached your f90 code
in the original email.  I just looked at it and found the f90 program is
pretty simple.  We don't have the absoft 8.0 compiler here so I tried to
use other compilers to reproduce your problem.  I was able to get a
signal 11 MPI abort with intel-8.1's fortran compiler.

> .../install_linux_105p4_intel81/bin/mpif90 -g -traceback test.f90
> .../install_linux_105p4_intel81/bin/mpiexec -n 2 a.out
rank 1 in job 19  schwinn.mcs.anl.gov_50482   caused collective abort of
all ranks
  exit status of rank 1: killed by signal 11
rank 0 in job 19  schwinn.mcs.anl.gov_50482   caused collective abort of
all ranks
  exit status of rank 0: killed by signal 11

However, if change your array size from 600 to say 300, the program runs
fine.  Or if I comment out "call func2()" in "subroutine func3()", the
program runs fine as well.  It seems there is a memory problem when
initialize the array1[] due to the statement:

array1(0:elem_xr-1,0:elem_xr-1) = func1()+func1()+func1()+func1()

The assignment statement is competing with MPI with memory and it
is related to how fortran compiler uses memory.  I am not familar with
fortran 90 compiler to say much here, but both intel-8.1 compiler and
absoft 8.0 compiler are pretty old, so you may want to use other newer
fortran compilers if possible.  If gfortran is working fine, why don't you
use gfortran then.

A.Chan





More information about the mpich-discuss mailing list