[mpich-discuss] MPI_Barrier(MPI_COMM_WORLD) failed

Xiao Bo Lu xiao.lu at auckland.ac.nz
Tue Apr 21 00:49:21 CDT 2009


Hi Gauri,

I think we are using a NFS file manager system on linux. I also noticed 
that when I ran "make testing", a few tests failed mainly to do with the 
IO like:

Looking in ./f77/io/testlist
Unexpected output in iwriteatf: mpiexec_hpc2 (handle_sig_occurred 1144): 
job ending due to env var MPIEXEC_TIMEOUT=180
Program iwriteatf exited without No Errors
Unexpected output in iwritef: mpiexec_hpc2 (handle_sig_occurred 1144): 
job ending due to env var MPIEXEC_TIMEOUT=180
Program iwritef exited without No Errors


Looking in ./cxx/io/testlist
Unexpected output in iwriteatx: mpiexec_hpc2 (handle_sig_occurred 1144): 
job ending due to env var MPIEXEC_TIMEOUT=180
Program iwriteatx exited without No Errors
Unexpected output in iwritex: mpiexec_hpc2 (handle_sig_occurred 1144): 
job ending due to env var MPIEXEC_TIMEOUT=180
Program iwritex exited without No Errors

Another odd thing is that my mpiexec seems to work fine with small 
problems but not for those large ones. I installed MUMPS (a parallel 
numerical solver) and passed it's mpi examples.

Regards
Xiao

Gauri Kulkarni wrote:
> Hi,
>
> I have no experience with MPICH, but just want to chip in. Recently, I 
> got errors like these as well (may not be related, but still). The 
> solution - or rather resolution - that I found from discussion here is 
> my version of MPICH2 is configured to be used with SLURM, meaning, the 
> process manager is slurm. If I start an mpd and run the program that 
> is compiled with my version of MPICH2, then I get these errors. What 
> process manager are you using?
>
> mpiexec -np 2 ./helloworld.mympi
>
>
> Fatal error in MPI_Finalize: Other MPI error, error stack:
> MPI_Finalize(255)...................: MPI_Finalize failed
> MPI_Finalize(154)...................:
> MPID_Finalize(94)...................:
> MPI_Barrier(406)....................: MPI_Barrier(comm=0x44000002) failed
> MPIR_Barrier(77)....................:
> MPIC_Sendrecv(120)..................:
> MPID_Isend(103).....................: failure occurred while 
> attempting to send an eager message
> MPIDI_CH3_iSend(172)................:
> MPIDI_CH3I_VC_post_sockconnect(1090):
> MPIDI_PG_SetConnInfo(615)...........: PMI_KVS_Get failedStatus of 
> MPI_Init = 0 Status of MPI_Comm_Rank = 0 Status of MPI_Comm_Size = 0 
> Hello world! I'm 1 of 2 on n53
> Fatal error in MPI_Finalize: Other MPI error, error stack:
> MPI_Finalize(255)...................: MPI_Finalize failed
> MPI_Finalize(154)...................:
> MPID_Finalize(94)...................:
> MPI_Barrier(406)....................: MPI_Barrier(comm=0x44000002) failed
> MPIR_Barrier(77)....................:
> MPIC_Sendrecv(120)..................:
> MPID_Isend(103).....................: failure occurred while 
> attempting to send an eager message
> MPIDI_CH3_iSend(172)................:
> MPIDI_CH3I_VC_post_sockconnect(1090):
> MPIDI_PG_SetConnInfo(615)...........: PMI_KVS_Get failedStatus of 
> MPI_Init = 0 Status of MPI_Comm_Rank = 0 Status of MPI_Comm_Size = 0 
> Hello world! I'm 0 of 2 on n53
>
> Gauri.
> ---------
>
>
> On Mon, Apr 20, 2009 at 11:20 AM, Xiao Bo Lu <xiao.lu 
> <http://xiao.lu>@auckland.ac.nz <http://auckland.ac.nz>> wrote:
>
>     Hi all,
>
>     I've recently installed MPICH2-1.0.8 on my local machine (x86_64
>     Linux, gfortran 4.1.2) and I am now experiencing errors with my
>     mpi code. The error messages are:
>
>     Fatal error in MPI_Barrier: Other MPI error, error stack:
>     MPI_Barrier(406)..........................:
>     MPI_Barrier(MPI_COMM_WORLD) failed
>     MPIR_Barrier(77)..........................:
>     MPIC_Sendrecv(126)........................:
>     MPIC_Wait(270)............................:
>     MPIDI_CH3i_Progress_wait(215).............: an error occurred
>     while handling an event returned by MPIDU_Sock_Wait()
>     MPIDI_CH3I_Progress_handle_sock_event(420):
>     MPIDU_Socki_handle_read(637)..............: connection failure
>     (set=0,sock=1,errno=104:Connection reset by peer)[cli_0]: aborting
>     job:
>     Fatal error in MPI_Barrier: Other MPI error, error stack:
>     MPI_Barrier(406)..........................:
>     MPI_Barrier(MPI_COMM_WORLD) failed
>     MPIR_Barrier(77)..........................:
>     MPIC_Sendrecv(126)........................:
>     MPIC_Wait(270)............................:
>     MPIDI_CH3i_Progress_wait(215).............: an error occurred
>     while handling an event returned by MPIDU_Sock_Wait()
>     MPIDI_CH3I_Progress_handle_sock_event(420):
>     MPIDU_Socki_handle_read size of processor is:                    4
>
>     I searched the net and found quite a few links about such error,
>     but none of the posts could give a definitive fix. Do some of you
>     know what could cause this error (e.g. incorrect installation;
>     environmental setting..) and how to fix it?
>
>     Regards
>     Xiao
>
>



More information about the mpich-discuss mailing list