[mpich-discuss] Problems running mpi

Dave Goodell goodell at mcs.anl.gov
Tue Oct 6 14:29:42 CDT 2009


Also, re-reading your error message you are getting a signal 11  
(SIGSEGV) on one of your processes, which means that you have a  
segfault somewhere.  It might be in MPICH2 somewhere (1.0.6 is very  
old) but there's a good chance it's in your user code somewhere.   
Upgrade your MPICH2 version and if you are still having trouble turn  
on core files and see where the code is segfaulting.

-Dave

On Oct 6, 2009, at 11:29 AM, Dave Goodell wrote:

> What version of MPICH2 are you using?  This error message looks like  
> one from an older version of MPICH2.  The current version is 1.1.1p1  
> and in general should be preferred over all previous versions of  
> MPICH2.
>
> -Dave
>
> On Oct 6, 2009, at 11:01 AM, Fernando Saez wrote:
>
>> Dear MPICH discussion group
>>
>> I am trying to run a MPI program, but I fail with the following  
>> error:
>>
>> 1: Fatal error in MPI_Recv: Other MPI error, error stack:
>> 1: MPI_Recv(186)................: MPI_Recv(buf=0xbfdf70e8,  
>> count=52, MPI_DOUBLE, src=0, tag=0, MPI_COMM_WORLD,  
>> status=0xbfdf6f34) failed
>> 1: MPIDI_CH3i_Progress_wait(207): sock_wait failed
>> 1: MPIDU_Sock_wait(202).........: unexpected operating system error  
>> (errno=22:(strerror() not found))
>> rank 0 in job 71  lidic01.unsl.edu.ar_39689   caused collective  
>> abort of all ranks
>>  exit status of rank 0: killed by signal 11
>>
>> The program ejecute very well with smaller input size, but when I  
>> row the size it crashing.
>>
>> Let me know if this error sounds familiar to you and if you have  
>> any suggestions for what to do here.
>>
>> Thanks,
>>
>> Fernando
>



More information about the mpich-discuss mailing list