<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v =
"urn:schemas-microsoft-com:vml" xmlns:o =
"urn:schemas-microsoft-com:office:office" xmlns:w =
"urn:schemas-microsoft-com:office:word"><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.5730.11" name=GENERATOR><!--[if !mso]>
<STYLE>v\:* {
        BEHAVIOR: url(#default#VML)
}
o\:* {
        BEHAVIOR: url(#default#VML)
}
w\:* {
        BEHAVIOR: url(#default#VML)
}
.shape {
        BEHAVIOR: url(#default#VML)
}
</STYLE>
<![endif]-->
<STYLE>@font-face {
        font-family: Tahoma;
}
@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.25in 1.0in 1.25in; }
P.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
LI.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
DIV.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
A:link {
        COLOR: blue; TEXT-DECORATION: underline
}
SPAN.MsoHyperlink {
        COLOR: blue; TEXT-DECORATION: underline
}
A:visited {
        COLOR: purple; TEXT-DECORATION: underline
}
SPAN.MsoHyperlinkFollowed {
        COLOR: purple; TEXT-DECORATION: underline
}
SPAN.EmailStyle17 {
        COLOR: navy; FONT-FAMILY: Arial; mso-style-type: personal-reply
}
DIV.Section1 {
        page: Section1
}
</STYLE>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></HEAD>
<BODY lang=EN-US vLink=purple link=blue bgColor=white>
<DIV dir=ltr align=left><SPAN class=174520422-29112006><FONT face=Arial
color=#0000ff size=2>It should work if the byte ordering and type sizes are the
same, unless there is something funky going on because of the different OSes.
</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=174520422-29112006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=174520422-29112006><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>Matthew
Chambers<BR><B>Sent:</B> Wednesday, November 29, 2006 2:08 PM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> RE: [MPICH] Communication problem
on a small heterogeneous ring involving Red Hat Linux - Enterprise
Edition<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV class=Section1>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial">Is he actually
talking about a heterogeneous system? All I see are different operating
systems. Unless the Xeon is 64 bit, it seems like byte ordering and type
sizes should all be equal, which as far as I know is the only kind of
heterogeneity that would affect MPI. Am I
wrong?<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial">Matt
Chambers<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial">Vanderbilt
Bioinformatics<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P>
<DIV
style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: medium none; PADDING-LEFT: 4pt; PADDING-BOTTOM: 0in; BORDER-LEFT: blue 1.5pt solid; PADDING-TOP: 0in; BORDER-BOTTOM: medium none">
<DIV>
<DIV class=MsoNormal style="TEXT-ALIGN: center" align=center><FONT
face="Times New Roman" size=3><SPAN style="FONT-SIZE: 12pt">
<HR tabIndex=-1 align=center width="100%" SIZE=2>
</SPAN></FONT></DIV>
<P class=MsoNormal><B><FONT face=Tahoma size=2><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">From:</SPAN></FONT></B><FONT
face=Tahoma size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">
owner-mpich-discuss@mcs.anl.gov [mailto:owner-mpich-discuss@mcs.anl.gov]
<B><SPAN style="FONT-WEIGHT: bold">On Behalf Of </SPAN></B>Rajeev
Thakur<BR><B><SPAN style="FONT-WEIGHT: bold">Sent:</SPAN></B> Wednesday,
November 29, 2006 1:11 PM<BR><B><SPAN style="FONT-WEIGHT: bold">To:</SPAN></B>
'Salvatore Sorce'; mpich-discuss@mcs.anl.gov<BR><B><SPAN
style="FONT-WEIGHT: bold">Subject:</SPAN></B> RE: [MPICH] Communication
problem on a small heterogeneous ring involving Red Hat Linux - Enterprise
Edition</SPAN></FONT><o:p></o:p></P></DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=blue size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: Arial">MPICH2 does not work
on heterogeneous systems yet, although we plan to support it in the
future.</SPAN></FONT><o:p></o:p></P>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial color=blue size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: Arial">Rajeev</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<BLOCKQUOTE
style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: medium none; PADDING-LEFT: 4pt; PADDING-BOTTOM: 0in; MARGIN: 5pt 0in 5pt 3.75pt; BORDER-LEFT: blue 1.5pt solid; PADDING-TOP: 0in; BORDER-BOTTOM: medium none">
<DIV class=MsoNormal style="TEXT-ALIGN: center" align=center><FONT
face="Times New Roman" size=3><SPAN style="FONT-SIZE: 12pt">
<HR tabIndex=-1 align=center width="100%" SIZE=2>
</SPAN></FONT></DIV>
<P class=MsoNormal style="MARGIN-BOTTOM: 12pt"><B><FONT face=Tahoma
size=2><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">From:</SPAN></FONT></B><FONT
face=Tahoma size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">
owner-mpich-discuss@mcs.anl.gov [mailto:owner-mpich-discuss@mcs.anl.gov]
<B><SPAN style="FONT-WEIGHT: bold">On Behalf Of </SPAN></B>Salvatore
Sorce<BR><B><SPAN style="FONT-WEIGHT: bold">Sent:</SPAN></B> Wednesday,
November 29, 2006 8:51 AM<BR><B><SPAN
style="FONT-WEIGHT: bold">To:</SPAN></B>
mpich-discuss@mcs.anl.gov<BR><B><SPAN
style="FONT-WEIGHT: bold">Subject:</SPAN></B> [MPICH] Communication problem
on a small heterogeneous ring involving Red Hat Linux - Enterprise
Edition</SPAN></FONT><o:p></o:p></P>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Dear
all,</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I have a small cluster composed
by four machines: two double-PIII running Scientific Linux CERN release
3.0.8 (kernel 2.4.21-37.EL), one double-Xeon running Red Hat Enterprise
Linux WS release 3 (Taroon update 4, kernel 2.4.21-27 EL), and one
single-PIV running Red Hat Linux Release 9 (Shrike, kernel 2.4.20-8). All
machines have MPICH2 1.0.4p1 installed.</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Tests on whatever kind of ring I
set up are OK, and processes are correctly spawned and started on right
hosts.</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I experienced problems when
processes need to communicate each other, and one side of the communication
is the machine with Red Hat Enterprise Linux OS. If I do not involve the
Enterprise Linux machine in communications, all runs
right.</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I am using a simple
send-and-receive Fortran test program, where process #1 sends an array of
real to process #0. Both processes use blocking communication functions
(mpi_send and mpi_recv).</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">When process #0 (the receiving
one) runs on the Red Hat Enterprise Linux machine, all hangs up at the
mpi_send (maybe because on the Enterprise Linux side the mpi_recv do not
accomplish its task).</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">When process #1 (the sending
one) runs on the Red Hat Enterprise Linux machine, I obtain the following
output:</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">[cli_0]: aborting job:<BR>Fatal
error in MPI_Recv: Other MPI error, error
stack:<BR>MPI_Recv(186)................................:
MPI_Recv(buf=0xbfff2368, count=2, MPI_REAL, src=1, tag=17, MPI_COMM_WORLD,
status=0xbfff2010) failed<BR>MPIDI_CH3_Progress_wait(217).................:
an error occurred while handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(590)...:
<BR>MPIDI_CH3_Sockconn_handle_connopen_event(791): unable to find the
process group structure with id <><BR>[cli_1]: aborting job:<BR>Fatal
error in MPI_Send: Other MPI error, error
stack:<BR>MPI_Send(173).............................:
MPI_Send(buf=0x7fbffebdf0, count=2, MPI_REAL, dest=0, tag=17,
MPI_COMM_WORLD) failed<BR>MPIDI_CH3_Progress_wait(217)..............: an
error occurred while handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(415):
<BR>MPIDU_Socki_handle_read(670)..............: connection failure
(set=0,sock=1,errno=104:Connection reset by peer)<BR>rank 0 in job 1
mpitemp_32877 caused collective abort of all ranks<BR>
exit status of rank 0: return code 1</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I understand that in both cases
mpi_recv causes an error, what is the
problem?</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Thank you in advance for your
attention.</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Regards,</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Salvatore.</SPAN></FONT><o:p></o:p></P></DIV></BLOCKQUOTE></DIV></DIV></BLOCKQUOTE></BODY></HTML>