[mpich-discuss] MPI_Barrier(MPI_COMM_WORLD) failed

Xiao Bo Lu xiao.lu at auckland.ac.nz
Mon Apr 20 18:27:24 CDT 2009


Hi,

Yes. It is a Fortran 90 code. I did re-compile all the source code and 
libraries with the new MPICH2. The cofiguration option I made is as:

./configure -prefix=/hpc/xlu012 CC=gcc F90=gfortran

and when I compiled all the files with the mpif90. I also did a few 
simple mpi tests with the mpiexec and it seems to work fine. I'm 
starting to wonder if there is anything to do with the memory allocation 
or some other communication variables that blocks some of the messages 
from a large array(??).

Regards
Xiao

Rajeev Thakur wrote:
> If it's Fortran code, just make sure no mpif.h files are left around from
> the old implementation. Also make sure that the entire code (all files) have
> been recompiled with MPICH2.
>
> Rajeev
>
>
>
>   
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Xiao Bo Lu
>> Sent: Monday, April 20, 2009 5:57 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] MPI_Barrier(MPI_COMM_WORLD) failed
>>
>> Hi Rajeev,
>>
>> Yes. The code was working but on a different platform (IBM-aix system 
>> with POE). I have to move the code to the new system since 
>> the lease on 
>> the old one just expired.
>>
>> Regards
>> Xiao
>>
>> Rajeev Thakur wrote:
>>     
>>> Was this code that worked earlier?
>>>
>>> Rajeev 
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Xiao Bo Lu
>>>> Sent: Monday, April 20, 2009 12:51 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: [mpich-discuss] MPI_Barrier(MPI_COMM_WORLD) failed
>>>>
>>>> Hi all,
>>>>
>>>> I've recently installed MPICH2-1.0.8 on my local machine 
>>>> (x86_64 Linux, 
>>>> gfortran 4.1.2) and I am now experiencing errors with my mpi 
>>>> code. The 
>>>> error messages are:
>>>>
>>>> Fatal error in MPI_Barrier: Other MPI error, error stack:
>>>> MPI_Barrier(406)..........................: 
>>>> MPI_Barrier(MPI_COMM_WORLD) 
>>>> failed
>>>> MPIR_Barrier(77)..........................:
>>>> MPIC_Sendrecv(126)........................:
>>>> MPIC_Wait(270)............................:
>>>> MPIDI_CH3i_Progress_wait(215).............: an error 
>>>>         
>> occurred while 
>>     
>>>> handling an event returned by MPIDU_Sock_Wait()
>>>> MPIDI_CH3I_Progress_handle_sock_event(420):
>>>> MPIDU_Socki_handle_read(637)..............: connection failure 
>>>> (set=0,sock=1,errno=104:Connection reset by peer)[cli_0]: 
>>>> aborting job:
>>>> Fatal error in MPI_Barrier: Other MPI error, error stack:
>>>> MPI_Barrier(406)..........................: 
>>>> MPI_Barrier(MPI_COMM_WORLD) 
>>>> failed
>>>> MPIR_Barrier(77)..........................:
>>>> MPIC_Sendrecv(126)........................:
>>>> MPIC_Wait(270)............................:
>>>> MPIDI_CH3i_Progress_wait(215).............: an error 
>>>>         
>> occurred while 
>>     
>>>> handling an event returned by MPIDU_Sock_Wait()
>>>> MPIDI_CH3I_Progress_handle_sock_event(420):
>>>> MPIDU_Socki_handle_read size of processor is:                    4
>>>>
>>>> I searched the net and found quite a few links about such 
>>>>         
>> error, but 
>>     
>>>> none of the posts could give a definitive fix. Do some of you 
>>>> know what 
>>>> could cause this error (e.g. incorrect installation; environmental 
>>>> setting..) and how to fix it?
>>>>
>>>> Regards
>>>> Xiao
>>>>
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>   



More information about the mpich-discuss mailing list