[MPICH] caused collective abort of all ranks using mpich2-1.0.3

Steve Kargl sgk at troutmask.apl.washington.edu
Fri Dec 22 15:23:11 CST 2006


On Sat, Dec 23, 2006 at 04:36:33AM +0800, Duan Sai wrote:
> 
> I have a problem with running mpirun in my Linux server.  My
> Linux server's OS is x86_64 (Redhat EL4 U8) and mpich version is
> mpich2-1.0.3.  My job is about scientifical numerical integrate.
> If a use a large mesh in my integrate the job runs very well.
> However when I use a small mesh to obtain more accurate value,
> the error happened like below
>  
> rank 1 in job 1  Machine   caused collective abort of all ranks
>   exit status of rank 1: killed by signal 11
> 
> How can I solved this problem?

What compiler are you using?  My first guess is an array index
is going out of bounds.  See if your compiler has a bounds checking
option.

-- 
Steve




More information about the mpich-discuss mailing list