<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v =
"urn:schemas-microsoft-com:vml" xmlns:o =
"urn:schemas-microsoft-com:office:office" xmlns:w =
"urn:schemas-microsoft-com:office:word"><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2900.2995" name=GENERATOR><!--[if !mso]>
<STYLE>v\:* {
        BEHAVIOR: url(#default#VML)
}
o\:* {
        BEHAVIOR: url(#default#VML)
}
w\:* {
        BEHAVIOR: url(#default#VML)
}
.shape {
        BEHAVIOR: url(#default#VML)
}
</STYLE>
<![endif]-->
<STYLE>@font-face {
        font-family: Tahoma;
}
@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.25in 1.0in 1.25in; }
P.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
LI.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
DIV.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
A:link {
        COLOR: blue; TEXT-DECORATION: underline
}
SPAN.MsoHyperlink {
        COLOR: blue; TEXT-DECORATION: underline
}
A:visited {
        COLOR: purple; TEXT-DECORATION: underline
}
SPAN.MsoHyperlinkFollowed {
        COLOR: purple; TEXT-DECORATION: underline
}
SPAN.EmailStyle17 {
        COLOR: navy; FONT-FAMILY: Arial; mso-style-type: personal-reply
}
DIV.Section1 {
        page: Section1
}
</STYLE>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></HEAD>
<BODY lang=EN-US vLink=purple link=blue bgColor=white>
<DIV><FONT face=Arial size=2>Thank you all for your prompt replies.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>I carried out some other trial including in the
ring another double-Xeon machine running Scientific Linux CERN release 3.0.6
(kernel 2.4.21-37.EL). This does not introduce other communication errors.
Whatever ring which does not include the Red Hat Enterprise machine does not
generate communication errors.</FONT></DIV>
<DIV><FONT face=Arial size=2>All machines have the same byte ordering
(little-endian).</FONT></DIV>
<DIV><FONT face=Arial size=2>I compared the "config.log" files of all
involved systems, and I found that the RH Enterprise machine has a different
size for <long> and <long double> types. In more detail, other
machines have a size of 4 and 12 respectively, while the RH Enteprise
has 8 and 16. Maybe this is the reason I experience the mentioned
problems.</FONT></DIV>
<DIV><FONT face=Arial size=2>To complete the report, I obtain the same error
messages even if I use types of the same size (such as
MPI_INTEGER).</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Do you know if there is a solution, or simply do I
have to exclude the RH Enterprise machine from my cluster (or, maybe, install
another OS on that machine)?</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Thank you again.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Regards,</FONT></DIV>
<DIV><FONT face=Arial size=2>Salvatore.</FONT></DIV>
<BLOCKQUOTE dir=ltr
style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
<DIV style="FONT: 10pt arial">----- Original Message ----- </DIV>
<DIV
style="BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: black"><B>From:</B>
<A title=thakur@mcs.anl.gov href="mailto:thakur@mcs.anl.gov">Rajeev Thakur</A>
</DIV>
<DIV style="FONT: 10pt arial"><B>To:</B> <A
title=matthew.chambers@vanderbilt.edu
href="mailto:matthew.chambers@vanderbilt.edu">'Matthew Chambers'</A> ; <A
title=mpich-discuss@mcs.anl.gov
href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A> </DIV>
<DIV style="FONT: 10pt arial"><B>Sent:</B> Wednesday, November 29, 2006 11:06
PM</DIV>
<DIV style="FONT: 10pt arial"><B>Subject:</B> RE: [MPICH] Communication
problem on a small heterogeneous ring involving Red Hat Linux - Enterprise
Edition</DIV>
<DIV><FONT face=Arial size=2></FONT><FONT face=Arial size=2></FONT><FONT
face=Arial size=2></FONT><FONT face=Arial size=2></FONT><FONT face=Arial
size=2></FONT><FONT face=Arial size=2></FONT><BR></DIV>
<DIV dir=ltr align=left><SPAN class=174520422-29112006><FONT face=Arial
color=#0000ff size=2>It should work if the byte ordering and type sizes are
the same, unless there is something funky going on because of the different
OSes. </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=174520422-29112006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=174520422-29112006><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> <A
href="mailto:owner-mpich-discuss@mcs.anl.gov">owner-mpich-discuss@mcs.anl.gov</A>
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>Matthew
Chambers<BR><B>Sent:</B> Wednesday, November 29, 2006 2:08 PM<BR><B>To:</B>
<A
href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A><BR><B>Subject:</B>
RE: [MPICH] Communication problem on a small heterogeneous ring involving
Red Hat Linux - Enterprise Edition<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV class=Section1>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial">Is he actually
talking about a heterogeneous system? All I see are different
operating systems. Unless the Xeon is 64 bit, it seems like byte
ordering and type sizes should all be equal, which as far as I know is the
only kind of heterogeneity that would affect MPI. Am I
wrong?<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial">Matt
Chambers<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial">Vanderbilt
Bioinformatics<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P>
<DIV
style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: medium none; PADDING-LEFT: 4pt; PADDING-BOTTOM: 0in; BORDER-LEFT: blue 1.5pt solid; PADDING-TOP: 0in; BORDER-BOTTOM: medium none">
<DIV>
<DIV class=MsoNormal style="TEXT-ALIGN: center" align=center><FONT
face="Times New Roman" size=3><SPAN style="FONT-SIZE: 12pt">
<HR tabIndex=-1 align=center width="100%" SIZE=2>
</SPAN></FONT></DIV>
<P class=MsoNormal><B><FONT face=Tahoma size=2><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">From:</SPAN></FONT></B><FONT
face=Tahoma size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">
owner-mpich-discuss@mcs.anl.gov [mailto:owner-mpich-discuss@mcs.anl.gov]
<B><SPAN style="FONT-WEIGHT: bold">On Behalf Of </SPAN></B>Rajeev
Thakur<BR><B><SPAN style="FONT-WEIGHT: bold">Sent:</SPAN></B> Wednesday,
November 29, 2006 1:11 PM<BR><B><SPAN
style="FONT-WEIGHT: bold">To:</SPAN></B> 'Salvatore Sorce';
mpich-discuss@mcs.anl.gov<BR><B><SPAN
style="FONT-WEIGHT: bold">Subject:</SPAN></B> RE: [MPICH] Communication
problem on a small heterogeneous ring involving Red Hat Linux - Enterprise
Edition</SPAN></FONT><o:p></o:p></P></DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=blue size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: Arial">MPICH2 does not
work on heterogeneous systems yet, although we plan to support it in the
future.</SPAN></FONT><o:p></o:p></P>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial color=blue size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: Arial">Rajeev</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<BLOCKQUOTE
style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: medium none; PADDING-LEFT: 4pt; PADDING-BOTTOM: 0in; MARGIN: 5pt 0in 5pt 3.75pt; BORDER-LEFT: blue 1.5pt solid; PADDING-TOP: 0in; BORDER-BOTTOM: medium none">
<DIV class=MsoNormal style="TEXT-ALIGN: center" align=center><FONT
face="Times New Roman" size=3><SPAN style="FONT-SIZE: 12pt">
<HR tabIndex=-1 align=center width="100%" SIZE=2>
</SPAN></FONT></DIV>
<P class=MsoNormal style="MARGIN-BOTTOM: 12pt"><B><FONT face=Tahoma
size=2><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">From:</SPAN></FONT></B><FONT
face=Tahoma size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">
owner-mpich-discuss@mcs.anl.gov [mailto:owner-mpich-discuss@mcs.anl.gov]
<B><SPAN style="FONT-WEIGHT: bold">On Behalf Of </SPAN></B>Salvatore
Sorce<BR><B><SPAN style="FONT-WEIGHT: bold">Sent:</SPAN></B> Wednesday,
November 29, 2006 8:51 AM<BR><B><SPAN
style="FONT-WEIGHT: bold">To:</SPAN></B>
mpich-discuss@mcs.anl.gov<BR><B><SPAN
style="FONT-WEIGHT: bold">Subject:</SPAN></B> [MPICH] Communication
problem on a small heterogeneous ring involving Red Hat Linux - Enterprise
Edition</SPAN></FONT><o:p></o:p></P>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Dear
all,</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I have a small cluster
composed by four machines: two double-PIII running Scientific Linux CERN
release 3.0.8 (kernel 2.4.21-37.EL), one double-Xeon running Red Hat
Enterprise Linux WS release 3 (Taroon update 4, kernel 2.4.21-27 EL), and
one single-PIV running Red Hat Linux Release 9 (Shrike, kernel 2.4.20-8).
All machines have MPICH2 1.0.4p1
installed.</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Tests on whatever kind of ring
I set up are OK, and processes are correctly spawned and started on right
hosts.</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I experienced problems when
processes need to communicate each other, and one side of the
communication is the machine with Red Hat Enterprise Linux OS. If I do not
involve the Enterprise Linux machine in communications, all runs
right.</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I am using a simple
send-and-receive Fortran test program, where process #1 sends an array of
real to process #0. Both processes use blocking communication functions
(mpi_send and mpi_recv).</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">When process #0 (the receiving
one) runs on the Red Hat Enterprise Linux machine, all hangs up at
the mpi_send (maybe because on the Enterprise Linux side the mpi_recv do
not accomplish its task).</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">When process #1 (the sending
one) runs on the Red Hat Enterprise Linux machine, I obtain the following
output:</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">[cli_0]: aborting
job:<BR>Fatal error in MPI_Recv: Other MPI error, error
stack:<BR>MPI_Recv(186)................................:
MPI_Recv(buf=0xbfff2368, count=2, MPI_REAL, src=1, tag=17, MPI_COMM_WORLD,
status=0xbfff2010)
failed<BR>MPIDI_CH3_Progress_wait(217).................: an error occurred
while handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(590)...:
<BR>MPIDI_CH3_Sockconn_handle_connopen_event(791): unable to find the
process group structure with id <><BR>[cli_1]: aborting
job:<BR>Fatal error in MPI_Send: Other MPI error, error
stack:<BR>MPI_Send(173).............................:
MPI_Send(buf=0x7fbffebdf0, count=2, MPI_REAL, dest=0, tag=17,
MPI_COMM_WORLD) failed<BR>MPIDI_CH3_Progress_wait(217)..............: an
error occurred while handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(415):
<BR>MPIDU_Socki_handle_read(670)..............: connection failure
(set=0,sock=1,errno=104:Connection reset by peer)<BR>rank 0 in job 1
mpitemp_32877 caused collective abort of all ranks<BR>
exit status of rank 0: return code 1</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I understand that in both
cases mpi_recv causes an error, what is the
problem?</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Thank you in advance for your
attention.</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Regards,</SPAN></FONT><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Salvatore.</SPAN></FONT><o:p></o:p></P></DIV></BLOCKQUOTE></DIV></DIV></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>