[MPICH] collective abort of all ranks

Rajeev Thakur thakur at mcs.anl.gov
Sun Jun 10 19:17:21 CDT 2007


Can you try with some other f90 compiler?

Rajeev 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of 
> Kamaraju Kusumanchi
> Sent: Sunday, June 10, 2007 6:36 PM
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] collective abort of all ranks
> 
> > >
> > > $mpif90 -show
> > > f90 
> -I/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_ab
soft_8.0/include
> > > 
> -p/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_ab
soft_8.0/include
> > > 
> -L/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.3.0_ab
soft_8.0/lib
> > > -lmpichf90 -lmpichf90 -lmpich -lpthread -lrt
> > >
> > > Here f90 is absoft 8.0 fortran 90 compiler. I am using 
> mpich2 1.0.5p4,
> > > compiled with absoft fortran compiler 8.0, gcc 4.3.0.
> >
> > If possbile, you may want to use gcc 4.2 instead of the 
> experimental gcc
> > 4.3, just in case 4.3 is buggy...
> >
> 
> 
> Just an update.
> 
> The problem is reproducible even when mpich2 is compiled with gcc 4.2,
> absoft 8.0.
> 
> $mpif90 -show
> f90 
> -I/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.2.0_ab
soft_8.0/include
> -p/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.2.0_ab
soft_8.0/include
> -L/home/raju/software/compiledLibs/mpich2_1.0.5p4_gcc_4.2.0_ab
soft_8.0/lib
> -lmpichf90 -lmpichf90 -lmpich -lpthread -lrt
> 
> $gcc -v
> Using built-in specs.
> Target: i686-pc-linux-gnu
> Configured with: /home/raju/software/unZipped/gcc-4.2.0/configure
> --prefix=/home/raju/software/compiledSoftware/gcc_4.2.0_20070514
> Thread model: posix
> gcc version 4.2.0
> 
> $mpif90 test.f90
> 
> $mpiexec -l -n 4 ./a.out
> rank 3 in job 7  node1.jit.mae.cornell.edu_38952   caused collective
> abort of all ranks
>   exit status of rank 3: killed by signal 11
> rank 1 in job 7  node1.jit.mae.cornell.edu_38952   caused collective
> abort of all ranks
>   exit status of rank 1: killed by signal 11
> 
> When I ran the program via gdb using
> 
> mpiexec -gdb -l -n 4 ./a.out
> 
> the code seems to segfault at line 19 when it calls func1. AFAIK there
> is no memory violation in that part.
> 
> I will try to recompile the mpich2 libs with --enable-g=meminit,dbg
> and see if there is anything new to report.
> 
> raju
> 
> 




More information about the mpich-discuss mailing list