[MPICH] caused collective abort of all ranks using mpich2-1.0.3

Duan Sai duansai at gmail.com
Fri Dec 22 22:59:43 CST 2006


Dear Steve Kargl,                        2006-12-23

     Thanks for your suggestion and your opinion is indeed get the points. My compiler is ifort whose version is 9.1.036. And I recompiled my code with the ifrot option -check bounds. When I run the recompiled software I got the error massage like bleow:

forrtl: severe (408): fort: (2): Subscript #1 of the array INDEX has value 48 which is greater than the upper bound of 47

Image              PC                Routine            Line        Source
vasp               0000000000CC27D7  Unknown               Unknown  Unknown
vasp               0000000000CC0CB6  Unknown               Unknown  Unknown
vasp               0000000000C8E95E  Unknown               Unknown  Unknown
vasp               0000000000C5A4DA  Unknown               Unknown  Unknown
vasp               0000000000C5A71E  Unknown               Unknown  Unknown
vasp               000000000053429A  Unknown               Unknown  Unknown
vasp               000000000041CFF6  Unknown               Unknown  Unknown
vasp               0000000000407976  Unknown               Unknown  Unknown
libc.so.6          0000002A95BA51D7  Unknown               Unknown  Unknown
vasp               00000000004078AA  Unknown               Unknown  Unknown
rank 1 in job 9  Dell439-5_32770   caused collective abort of all ranks
  exit status of rank 1: return code 152

How can I increase upper bound of array INDEX for my computer? 

Regards, 

DUAN Sai
Dept. of Chemistry, Xiamen University, Xiamen, P. R. China.
E-mail   duansai at gamil.com

======= 2006-12-23 05:23:11 Original Message:  =======

>On Sat, Dec 23, 2006 at 04:36:33AM +0800, Duan Sai wrote:
>> 
>> I have a problem with running mpirun in my Linux server.  My
>> Linux server's OS is x86_64 (Redhat EL4 U8) and mpich version is
>> mpich2-1.0.3.  My job is about scientifical numerical integrate.
>> If a use a large mesh in my integrate the job runs very well.
>> However when I use a small mesh to obtain more accurate value,
>> the error happened like below
>>  
>> rank 1 in job 1  Machine   caused collective abort of all ranks
>>   exit status of rank 1: killed by signal 11
>> 
>> How can I solved this problem?
>
>What compiler are you using?  My first guess is an array index
>is going out of bounds.  See if your compiler has a bounds checking
>option.
>
>-- 
>Steve

= = = = = = = = = = = = = = = = = = = = = = = = = = = = =
			






More information about the mpich-discuss mailing list