[mpich-discuss] Fatal error in MPI_Barrier
Antonio José Gallardo Díaz
ajcampa at hotmail.com
Tue Feb 3 04:12:10 CST 2009
When i put the command "mpd &" in the master node, i receive this:
[1] 8198
mpi at wireless:~$ wireless_57956 (mpd_sockpair 226): connect -5 No address associated with hostname
wireless_57956 (mpd_sockpair 233): connect error with -5 No address associated with hostname
..... and continues thinking ....
From: ajcampa at hotmail.com
To: mpich-discuss at mcs.anl.gov
Date: Tue, 3 Feb 2009 10:49:29 +0100
Subject: Re: [mpich-discuss] Fatal error in MPI_Barrier
In first place thanks for your help.
I have in the archive:
/etc/mpd.conf:
/*************************************************************************************/
#! /bin/sh
#
# This file contains configuration information for mpicc. This is
# essentially just the variable-initialization part of mpicc.
# --------------------------------------------------------------------------
# Set the default values of all variables.
#
# Directory locations: Fixed for any MPI implementation.
# Set from the directory arguments to configure (e.g., --prefix=/usr/local)
prefix=/usr/local
exec_prefix=${prefix}
sysconfdir=${prefix}/etc
includedir=${prefix}/include
libdir=${exec_prefix}/lib
#
# Default settings for compiler, flags, and libraries.
# Determined by a combination of environment variables and tests within
# configure (e.g., determining whehter -lsocket is needee)
CC="gcc"
WRAPPER_CFLAGS=""
WRAPPER_LDFLAGS=" "
MPILIBNAME="mpich"
PMPILIBNAME="pmpich"
MPI_OTHERLIBS="-lpthread -lrt "
NEEDSPLIB="no"
# MPIVERSION is the version of the MPICH2 library that mpicc is intended for
MPIVERSION="1.0.8"
/*************************************************************************************/
archive /etc/mpd.hosts:
/*************************************************************************************/
master ifhn=192.168.1.1
slave ifhn=192.168.1.2
/*************************************************************************************/
archive .mpd.conf:
/*************************************************************************************/
MPD_SECRETWORD=hola
/*************************************************************************************/
archive .mpd.hosts:
/*************************************************************************************/
master ifhn=192.168.1.1
slave ifhn=192.168.1.2
/*************************************************************************************/
I use the next command for wake up the cluster:
mpdboot --totalnum=2 --ifhn=192.168.1.1 -f .mpd.hosts
and when i try my job use:
mpiexec -recvtimeout 30 -n 2 ./Proyecto/debug/src/proyecto2
Need you anymore? Thanks.
From: thakur at mcs.anl.gov
To: mpich-discuss at mcs.anl.gov
Date: Mon, 2 Feb 2009 17:30:36 -0600
Subject: Re: [mpich-discuss] Fatal error in MPI_Barrier
What parameters did you pass to "configure" when you built
MPICH2?
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Antonio José
Gallardo Díaz
Sent: Monday, February 02, 2009 5:29 PM
To:
mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Fatal error
in MPI_Barrier
Only have two nodes.
Node 1--> name: master -->
hostname: wireless
Node 2--> name: slave----> hostname:
wireless2
For wake up the cluster i use the command
"mpdboot".
For example, i can to see how is the two node's
id. In my job, i use for example MPI_rank(...) and i receive the number of the
nodes, however if i use a MPI_Send(...) or MPI_Receive(...), mi job exit of
the application and show me a error.
If i use "mpiexec -l -n 2 hostname", i
receive:
0 : wireless
1: wireless 2
I don't know that it is
the answer for your question.
Thanks.
From: thakur at mcs.anl.gov
To: mpich-discuss at mcs.anl.gov
Date: Mon, 2
Feb 2009 15:52:52 -0600
Subject: Re: [mpich-discuss] Fatal error in
MPI_Barrier
The error message "unable to find the
process group structure with id <>" is odd. How exactly did you
configure MPICH2? Were you able to set up an MPD ring on the two nodes
successfully?
Rajeev
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Antonio José
Gallardo Díaz
Sent: Monday, February 02, 2009 12:39
PM
To: mpich-discuss at mcs.anl.gov
Subject: Re:
[mpich-discuss] Fatal error in MPI_Barrier
Hello. I Have tested to use the command:
mpiexec
-recvtimeout 30 -n 2 /home/mpi/mpich2-1.0.8/examples/cpi
and this is the
result.
/********************************************************************************************************************************************************/
Process 0 of 2 is on
wireless
Process 1 of 2 is on
wireless2
Fatal error in MPI_Bcast: Other MPI error, error
stack:
MPI_Bcast(786)............................:
MPI_Bcast(buf=0x7ffff732586c, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(230)...........................:
MPIC_Send(39).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............:
connection failure (set=0,sock=1,errno=104:Connection reset by peer)[cli_0]:
aborting job:
Fatal error in MPI_Bcast: Other MPI error, error
stack:
MPI_Bcast(786)............................:
MPI_Bcast(buf=0x7ffff732586c, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(230)...........................:
MPIC_Send(39).............................:
MPIC_Wait(270)............................:
MPIDI_CH3i_Progress_wait(215).............:
an error occurred while handling an event returned by MPFatal error in
MPI_Bcast: Other MPI error, error
stack:
MPI_Bcast(786)...............................:
MPI_Bcast(buf=0xbf82bec8, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)..............................:
MPIC_Recv(81)................................:
MPIC_Wait(270)...............................:
MPIDI_CH3i_Progress_wait(215)................:
an error occurred while handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(640)...:
MPIDI_CH3_Sockconn_handle_connopen_event(887):
unable to find the process group structure with id <>[cli_1]: aborting
job:
Fatal error in MPI_Bcast: Other MPI error, error
stack:
MPI_Bcast(786)...............................:
MPI_Bcast(buf=0xbf82bec8, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed
MPIR_Bcast(198)..............................:
MPIC_Recv(81)................................:
MPIC_Wait(270)...............................:
MPIDI_CH3i_Progress_wait(215)................:
an error occurred while handling an event
rIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(637)..............:
connection failure (set=0,sock=1,errno=104:Connection reset by
peer)
eturned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(640)...:
MPIDI_CH3_Sockconn_handle_connopen_event(887):
unable to find the process group structure with id <>
rank 1 in job
21 wireless_47695 caused collective abort of all
ranks
exit status of rank 1: return code 1
rank 0 in job
21 wireless_47695 caused collective abort of all
ranks
exit status of rank 0: return code
1
/********************************************************************************************************************************************************/
The
mpdcheck said that has a problem with the first ip but it's solved.
I
tested:
mpdcheck
-s
and in the other node
mpdcheck -c "name" "number"
--------------> Well.
mpiexec -n 1 /bin/hostname
------------------------------------------------------------------------------------------------------------->
Well.
mpiexec -l -n 4 /bin/hostname
---------------------------------------------------------------------------------------------------------->
Well.
I have to say that with all command i have to put the options
-recvtimeout 30 because but have problems. Without this option, say
me:
mpiexec_wireless (mpiexec 392): no msg recvd from mpd when
expecting ack of request
What can i do?? Please help and sorry
for my poor english.
From: ajcampa at hotmail.com
To: mpich-discuss at mcs.anl.gov
Date: Mon, 2
Feb 2009 18:17:39 +0100
Subject: Re: [mpich-discuss] Fatal error in
MPI_Barrier
Well, thanks for your answer. Really, the name of mi pc is "Wireless"
and the othes pc "Wireless2", i use in the two pc, the same user "mpi".
I will try the mpdchech utility and then write
something.
Thank for all.
Un saludo desde España.
From: thakur at mcs.anl.gov
To: mpich-discuss at mcs.anl.gov
Date: Mon, 2
Feb 2009 10:55:03 -0600
Subject: Re: [mpich-discuss] Fatal error in
MPI_Barrier
Are you really trying to use the wireless network?
Looks like that's what is getting used.
You can use the mpdcheck utility to diagnose
network configuration problems. See Appendix A.2 of the
installation guide.
Rajeev
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Antonio
José Gallardo Díaz
Sent: Monday, February 02, 2009 9:49
AM
To: mpich-discuss at mcs.anl.gov
Subject:
[mpich-discuss] Fatal error in MPI_Barrier
Hello, this error show me when i try my jobs that use
MPI.
Fatal error in MPI_Barrier: Other MPI error, error
stack:
MPI_Barrier(406).............................:
MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77).............................:
MPIC_Sendrecv(123)...........................:
MPIC_Wait(270)...............................:
MPIDI_CH3i_Progress_wait(215)................:
an error occurred while handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(640)...:
MPIDI_CH3_Sockconn_handle_connopen_event(887):
unable to find the process group structure with id <��oz�>[cli_1]:
aborting job:
Fatal error in MPI_Barrier: Other MPI error, error
stack:
MPI_Barrier(406).............................:
MPI_Barrier(MPI_COMM_WORLD)
failed
MPIR_Barrier(77).............................:
MPIC_Sendrecv(123)...........................:
MPIC_Wait(270)...............................:
MPIDI_CH3i_Progress_wait(215)................:
an error occurred while handling an event returned by
MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(640)...:
MPIDI_CH3_Sockconn_handle_connopen_event(887):
unable to find the process group structure with id <��oz�>
rank 1
in job 15 wireless_43226 caused collective abort of all
ranks
exit status of rank 1: killed by signal 9
I have
two PC's with linux (kubuntu 8.10). I make a cluster using this machines.
When use for example the command "mpiexec -l -n 2 hostname" i can see that
it's all right, but when i try to send o receive some thing i have the
same error. I don't know why. Please i need one hand. Thanks for all.
El doble de diversión: Con
Windows Live Messenger comparte fotos mientras hablas.
Con el nuevo Windows Live lo tendrás todo al
alcance de tu mano
Con el nuevo Windows Live lo tendrás todo al
alcance de tu mano
Tienes un nuevo Messenger por descubrir. ¡Descárgatelo!
Actualízate, descubre el nuevo Windows Live Messenger. ¡Descárgatelo ya!
_________________________________________________________________
Llévate Messenger en tu móvil allá donde vayas ¿A qué esperas?
http://serviciosmoviles.es.msn.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090203/ab9de549/attachment-0001.htm>
More information about the mpich-discuss
mailing list