<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<STYLE>.hmmessage P {
        PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; PADDING-TOP: 0px
}
BODY.hmmessage {
        FONT-SIZE: 10pt; FONT-FAMILY: Verdana
}
</STYLE>
<META content="MSHTML 6.00.6000.16788" name=GENERATOR></HEAD>
<BODY class=hmmessage>
<DIV dir=ltr align=left><SPAN class=854033023-02022009><FONT face=Arial
color=#0000ff>What parameters did you pass to "configure" when you built
MPICH2?</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma><B>From:</B> mpich-discuss-bounces@mcs.anl.gov
[mailto:mpich-discuss-bounces@mcs.anl.gov] <B>On Behalf Of </B>Antonio José
Gallardo Díaz<BR><B>Sent:</B> Monday, February 02, 2009 5:29 PM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> Re: [mpich-discuss] Fatal error
in MPI_Barrier<BR></FONT><BR></DIV>
<DIV></DIV>Only have two nodes.<BR> <BR>Node 1--> name: master -->
hostname: wireless<BR>Node 2--> name: slave----> hostname:
wireless2<BR> <BR>For wake up the cluster i use the command
"mpdboot".<BR> <BR>For example, i can to see how is the two node's
id. In my job, i use for example MPI_rank(...) and i receive the number of the
nodes, however if i use a MPI_Send(...) or MPI_Receive(...), mi job exit of
the application and show me a error.<BR>If i use "mpiexec -l -n 2 hostname", i
receive:<BR>0 : wireless<BR>1: wireless 2<BR> <BR>I don't know that it is
the answer for your question.<BR> <BR>Thanks.<BR><BR><BR>
<HR id=stopSpelling>
<BR>From: thakur@mcs.anl.gov<BR>To: mpich-discuss@mcs.anl.gov<BR>Date: Mon, 2
Feb 2009 15:52:52 -0600<BR>Subject: Re: [mpich-discuss] Fatal error in
MPI_Barrier<BR><BR><BR>
<STYLE>.ExternalClass .EC_hmmessage P {
        PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; PADDING-TOP: 0px
}
.ExternalClass BODY.EC_hmmessage {
        FONT-SIZE: 10pt; FONT-FAMILY: Verdana
}
</STYLE>
<DIV dir=ltr align=left><SPAN class=EC_388285121-02022009><FONT face=Arial
color=#0000ff>The error message "<FONT color=#000000>unable to find the
process group structure with id <>" is odd. How exactly did you
configure MPICH2? Were you able to set up an MPD ring on the two nodes
successfully?</FONT></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=EC_388285121-02022009><FONT
face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=EC_388285121-02022009><FONT
face=Arial>Rajeev</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=EC_OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR>
<FONT face=Tahoma><B>From:</B> mpich-discuss-bounces@mcs.anl.gov
[mailto:mpich-discuss-bounces@mcs.anl.gov] <B>On Behalf Of </B>Antonio José
Gallardo Díaz<BR><B>Sent:</B> Monday, February 02, 2009 12:39
PM<BR><B>To:</B> mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> Re:
[mpich-discuss] Fatal error in MPI_Barrier<BR></FONT><BR></DIV>
<DIV></DIV>Hello. I Have tested to use the command:<BR><BR>mpiexec
-recvtimeout 30 -n 2 /home/mpi/mpich2-1.0.8/examples/cpi
<BR><BR>and this is the
result.<BR><BR>/********************************************************************************************************************************************************/
<BR>Process 0 of 2 is on
wireless
<BR>Process 1 of 2 is on
wireless2
<BR>Fatal error in MPI_Bcast: Other MPI error, error
stack:
<BR>MPI_Bcast(786)............................:
MPI_Bcast(buf=0x7ffff732586c, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed
<BR>MPIR_Bcast(230)...........................:
<BR>MPIC_Send(39).............................:
<BR>MPIC_Wait(270)............................:
<BR>MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by
MPIDU_Sock_Wait()
<BR>MPIDI_CH3I_Progress_handle_sock_event(420):<BR>MPIDU_Socki_handle_read(637)..............:
connection failure (set=0,sock=1,errno=104:Connection reset by peer)[cli_0]:
aborting job:<BR>Fatal error in MPI_Bcast: Other MPI error, error
stack:<BR>MPI_Bcast(786)............................:
MPI_Bcast(buf=0x7ffff732586c, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed<BR>MPIR_Bcast(230)...........................:<BR>MPIC_Send(39).............................:<BR>MPIC_Wait(270)............................:<BR>MPIDI_CH3i_Progress_wait(215).............:
an error occurred while handling an event returned by MPFatal error in
MPI_Bcast: Other MPI error, error
stack:<BR>MPI_Bcast(786)...............................:
MPI_Bcast(buf=0xbf82bec8, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed<BR>MPIR_Bcast(198)..............................:<BR>MPIC_Recv(81)................................:<BR>MPIC_Wait(270)...............................:<BR>MPIDI_CH3i_Progress_wait(215)................:
an error occurred while handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(640)...:<BR>MPIDI_CH3_Sockconn_handle_connopen_event(887):
unable to find the process group structure with id <>[cli_1]: aborting
job:<BR>Fatal error in MPI_Bcast: Other MPI error, error
stack:<BR>MPI_Bcast(786)...............................:
MPI_Bcast(buf=0xbf82bec8, count=1, MPI_INT, root=0, MPI_COMM_WORLD)
failed<BR>MPIR_Bcast(198)..............................:<BR>MPIC_Recv(81)................................:<BR>MPIC_Wait(270)...............................:<BR>MPIDI_CH3i_Progress_wait(215)................:
an error occurred while handling an event
rIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(420):<BR>MPIDU_Socki_handle_read(637)..............:
connection failure (set=0,sock=1,errno=104:Connection reset by
peer)<BR>eturned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(640)...:<BR>MPIDI_CH3_Sockconn_handle_connopen_event(887):
unable to find the process group structure with id <><BR>rank 1 in job
21 wireless_47695 caused collective abort of all
ranks<BR> exit status of rank 1: return code 1<BR>rank 0 in job
21 wireless_47695 caused collective abort of all
ranks<BR> exit status of rank 0: return code
1<BR><BR>/********************************************************************************************************************************************************/<BR><BR>The
mpdcheck said that has a problem with the first ip but it's solved.<BR>I
tested:<BR><BR>mpdcheck
-s
and in the other node
mpdcheck -c "name" "number"
--------------> Well.<BR>mpiexec -n 1 /bin/hostname
------------------------------------------------------------------------------------------------------------->
Well.<BR>mpiexec -l -n 4 /bin/hostname
---------------------------------------------------------------------------------------------------------->
Well.<BR><BR>I have to say that with all command i have to put the options
-recvtimeout 30 because but have problems. Without this option, say
me:<BR><BR>mpiexec_wireless (mpiexec 392): no msg recvd from mpd when
expecting ack of request<BR><BR><BR>What can i do?? Please help and sorry
for my poor english.<BR><BR><BR><BR>
<HR id=EC_stopSpelling>
From: ajcampa@hotmail.com<BR>To: mpich-discuss@mcs.anl.gov<BR>Date: Mon, 2
Feb 2009 18:17:39 +0100<BR>Subject: Re: [mpich-discuss] Fatal error in
MPI_Barrier<BR><BR>
<STYLE>.ExternalClass .EC_hmmessage P {
        PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; PADDING-TOP: 0px
}
.ExternalClass BODY.EC_hmmessage {
        FONT-SIZE: 10pt; FONT-FAMILY: Verdana
}
</STYLE>
Well, thanks for your answer. Really, the name of mi pc is "Wireless"
and the othes pc "Wireless2", i use in the two pc, the same user "mpi".
<BR><BR>I will try the mpdchech utility and then write
something.<BR><BR>Thank for all.<BR><BR>Un saludo desde España.<BR><BR>
<HR id=EC_EC_stopSpelling>
From: thakur@mcs.anl.gov<BR>To: mpich-discuss@mcs.anl.gov<BR>Date: Mon, 2
Feb 2009 10:55:03 -0600<BR>Subject: Re: [mpich-discuss] Fatal error in
MPI_Barrier<BR><BR>
<STYLE>.ExternalClass .EC_hmmessage P {
        PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; PADDING-TOP: 0px
}
.ExternalClass BODY.EC_hmmessage {
        FONT-SIZE: 10pt; FONT-FAMILY: Verdana
}
</STYLE>
<DIV dir=ltr align=left><SPAN class=EC_EC_EC_811165316-02022009><FONT
face=Arial color=#0000ff>Are you really trying to use the wireless network?
Looks like that's what is getting used.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=EC_EC_EC_811165316-02022009><FONT
face=Arial color=#0000ff></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=EC_EC_EC_811165316-02022009><FONT
face=Arial color=#0000ff>You can use the mpdcheck utility to diagnose
network configuration problems. See Appendix A.2 of the
installation guide.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=EC_EC_EC_811165316-02022009><FONT
face=Arial color=#0000ff></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=EC_EC_EC_811165316-02022009><FONT
face=Arial color=#0000ff>Rajeev</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px"><DIV
class=EC_EC_EC_OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR>
<FONT face=Tahoma><B>From:</B> mpich-discuss-bounces@mcs.anl.gov
[mailto:mpich-discuss-bounces@mcs.anl.gov] <B>On Behalf Of </B>Antonio
José Gallardo Díaz<BR><B>Sent:</B> Monday, February 02, 2009 9:49
AM<BR><B>To:</B> mpich-discuss@mcs.anl.gov<BR><B>Subject:</B>
[mpich-discuss] Fatal error in MPI_Barrier<BR></FONT><BR></DIV>
<DIV></DIV>Hello, this error show me when i try my jobs that use
MPI.<BR><BR><BR>Fatal error in MPI_Barrier: Other MPI error, error
stack:<BR>MPI_Barrier(406).............................:
MPI_Barrier(MPI_COMM_WORLD)
failed<BR>MPIR_Barrier(77).............................:<BR>MPIC_Sendrecv(123)...........................:<BR>MPIC_Wait(270)...............................:<BR>MPIDI_CH3i_Progress_wait(215)................:
an error occurred while handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(640)...:<BR>MPIDI_CH3_Sockconn_handle_connopen_event(887):
unable to find the process group structure with id <��oz�>[cli_1]:
aborting job:<BR>Fatal error in MPI_Barrier: Other MPI error, error
stack:<BR>MPI_Barrier(406).............................:
MPI_Barrier(MPI_COMM_WORLD)
failed<BR>MPIR_Barrier(77).............................:<BR>MPIC_Sendrecv(123)...........................:<BR>MPIC_Wait(270)...............................:<BR>MPIDI_CH3i_Progress_wait(215)................:
an error occurred while handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(640)...:<BR>MPIDI_CH3_Sockconn_handle_connopen_event(887):
unable to find the process group structure with id <��oz�><BR>rank 1
in job 15 wireless_43226 caused collective abort of all
ranks<BR> exit status of rank 1: killed by signal 9<BR><BR>I have
two PC's with linux (kubuntu 8.10). I make a cluster using this machines.
When use for example the command "mpiexec -l -n 2 hostname" i can see that
it's all right, but when i try to send o receive some thing i have the
same error. I don't know why. Please i need one hand. Thanks for all.
<BR><BR>
<HR>
El doble de diversión: <A
href="http://www.microsoft.com/windows/windowslive/messenger.aspx">Con
Windows Live Messenger comparte fotos mientras hablas.</A> </BLOCKQUOTE><BR>
<HR>
Con el nuevo Windows Live lo tendrás <A href="http://home.live.com/">todo al
alcance de tu mano</A><BR>
<HR>
Con el nuevo Windows Live lo tendrás <A href="http://home.live.com/">todo al
alcance de tu mano</A> </BLOCKQUOTE><BR>
<HR>
Tienes un nuevo Messenger por descubrir. <A href="http://download.live.com/"
target=_new>¡Descárgatelo! </A></BLOCKQUOTE></BODY></HTML>