<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<STYLE>.hmmessage P {
        PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; PADDING-TOP: 0px
}
BODY.hmmessage {
        FONT-SIZE: 10pt; FONT-FAMILY: Tahoma
}
</STYLE>
<META content="MSHTML 6.00.6000.16525" name=GENERATOR></HEAD>
<BODY class=hmmessage>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff><SPAN
class=053195316-10092007>Hi,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff><SPAN
class=053195316-10092007> Also try to ping the hosts by specifying the
hostname (ping <hostname>).</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff><SPAN
class=053195316-10092007></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff><SPAN
class=053195316-10092007>Regards,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff><SPAN
class=053195316-10092007>Jayesh</SPAN></FONT></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>Jayesh
Krishna<BR><B>Sent:</B> Monday, September 10, 2007 9:39 AM<BR><B>To:</B>
'Richard Li'<BR><B>Cc:</B> mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> RE:
[MPICH] MPI_Bcast hangs in Windows XP<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV dir=ltr align=left><SPAN class=744333714-10092007><FONT face=Arial
color=#0000ff>Hi,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=744333714-10092007><FONT face=Arial
color=#0000ff> It looks like some problem with the name resolution for the
hostnames. Can you try one run by just specifying the ipaddress of the hosts in
config.txt ?</FONT></SPAN></DIV>
<DIV><FONT face=Arial color=#0000ff></FONT> </DIV>
<DIV><SPAN class=744333714-10092007><FONT face=Arial
color=#0000ff>Regards,</FONT></SPAN></DIV>
<DIV><SPAN class=744333714-10092007><FONT face=Arial
color=#0000ff>Jayesh</FONT></SPAN></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma><B>From:</B> Richard Li [mailto:xs_li@hotmail.com]
<BR><B>Sent:</B> Friday, September 07, 2007 9:00 PM<BR><B>To:</B> Jayesh
Krishna<BR><B>Cc:</B> mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> RE: [MPICH]
MPI_Bcast hangs in Windows XP<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV style="TEXT-ALIGN: left">Jayesh,<BR><BR>Thanks a lot for your
info.<BR><BR>After a lot of trys, I was finally able to run cpi.exe across
multiple hosts. It seems to have something to with the setting in my machine
file. The following is the detail.<BR><BR>I use the following command to run
cpi.exe:<BR><BR>mpiexec -n 2 -machinefile config.txt -channel ssm(or others)
cpi.exe<BR><BR>a) If I have a config.txt file like the
following:<BR> host1name:1 -ifhn
host1_ipaddress<BR> host2name:2 -ifhn
host2_ipaddress<BR> Everything works fine(for all channels).<BR>b)
If I have a config.txt like the following:<BR>
host1name:1<BR> host2:name:2<BR> then, for sock channel, it
hangs mpi_bcast. For auto and ssm, I got the following error message:<BR>
<BR>
<DIV><FONT face=Arial>C:\public\bin>mpiexec -n 2 -machinefile Config.txt
-channel ssm <SPAN class=EC_588480822-07092007> or auto
</SPAN>C:\public\bin\cpi.exe<BR>Enter the number of intervals: (0 quits)
100</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial>job aborted:<BR>rank: node: exit code[: error
message]<BR>0: B0016350B383E: 1: Fatal error in MPI_Bcast: Other MPI error,
error stack:<BR>MPI_Bcast(784).................: MPI_Bcast(buf=0012FE88,
count=1, MPI_INT, root=0, MPI_COMM_WORLD)
fai<BR>led<BR>MPIR_Bcast(230)................:<BR>MPIC_Send(36)..................:<BR>MPIDI_EagerContigSend(146).....:
failure occurred while attempting to send an eager
message<BR>MPIDI_CH3_iStartMsgv(224)......:<BR>MPIDI_CH3I_VC_post_connect(555):
[ch3:sock] rank 0 unable to connect to rank 1 using business card
<po<BR>rt=3872 description=B001279FD7C60.corp.bankofamerica.com
ifname=171.188.32.154 shm_host=B001279FD7C60.<BR>corp.bankofamerica.com
shm_queue=39E4F281-FCC0-4f4a-B540-EDC8D517F065 shm_pid=2484
><BR>MPIDU_Sock_post_connect(1228)..: unable to connect to
B001279FD7C60.corp.bankofamerica.com on port 387<BR>2, exhausted all endpoints
(errno -1)<BR>MPIDU_Sock_post_connect(1244)..: gethostbyname failed, The
requested name is valid and was found in th<BR>e database, but it does not have
the correct associated data being resolved for. (errno 11004)<BR>1:
B001279FD7C60: 1</FONT></DIV><BR>I know this has something to do with my network
setting, but just can't figure out why.<BR><BR>Any
ideas?<BR><BR>Thanks<BR><BR>Richard<BR><BR></DIV><BR><BR><BR>
<BLOCKQUOTE>
<HR id=EC_stopSpelling>
From: jayesh@mcs.anl.gov<BR>To: xs_li@hotmail.com<BR>CC:
mpich-discuss@mcs.anl.gov<BR>Subject: RE: [MPICH] MPI_Bcast hangs in Windows
XP<BR>Date: Thu, 6 Sep 2007 09:25:36 -0500<BR><BR>
<META content="Microsoft SafeHTML" name=Generator>
<STYLE>.ExternalClass .EC_hmmessage P {
        PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; PADDING-TOP: 0px
}
.ExternalClass EC_BODY.hmmessage {
        FONT-SIZE: 10pt; FONT-FAMILY: Tahoma
}
</STYLE>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff><SPAN
class=EC_837321314-06092007>Hi,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff><SPAN
class=EC_837321314-06092007> The process manager (smpd) is responsible
for launching the MPI processes on the various machines and providing an MPI
processes information on how to communicate with other MPI
processes.</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff><SPAN
class=EC_837321314-06092007> The SMPD process manager listens (default
case) on port 8676 and then asks the client PM to connect to a new port.
So you should allow SMPD process manager (smpd.exe --- installed as a service
in windows) to communicate at all ports (This is the easiest way. However you
can also restrict the port range used by SMPD. Refer to the windows devloper's
guide available at <A href="http://www-unix.mcs.anl.gov/mpi/mpich/"
target=_blank>http://www-unix.mcs.anl.gov/mpi/mpich/</A> for
details.)</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff><SPAN
class=EC_837321314-06092007> Make sure that no firewall (1. Running on
the individual machines 2. OR on the network, filtering the
traffic btw the machines) is preventing the process managers & the MPI
procs on the individual machines from contacting each
other.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff><SPAN
class=EC_837321314-06092007></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff><SPAN class=EC_837321314-06092007>(Note:
Since you do not know what changed in your network, it might help if you try
analyzing the network packets sent btw the machines using a packet sniffer
like Ethereal.)</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff><SPAN
class=EC_837321314-06092007></SPAN></FONT> </DIV>
<DIV><SPAN class=EC_837321314-06092007><FONT face=Arial
color=#0000ff>Regards,</FONT></SPAN></DIV>
<DIV><SPAN class=EC_837321314-06092007><FONT face=Arial
color=#0000ff>Jayesh</FONT></SPAN></DIV><BR>
<DIV class=EC_OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR>
<FONT face=Tahoma><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>Richard
Li<BR><B>Sent:</B> Wednesday, September 05, 2007 8:21 PM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> [MPICH] MPI_Bcast hangs in
Windows XP<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV style="TEXT-ALIGN: left"><FONT face=Arial>
<DIV><SPAN class=EC_EC_618441420-17082007><FONT face=Arial>Hi
there,</FONT></SPAN></DIV>
<DIV><SPAN class=EC_EC_618441420-17082007></SPAN> </DIV>
<DIV><SPAN class=EC_EC_618441420-17082007><FONT face=Arial>I am writing an
application in Windows XP/VC8 and am having problem with MPI_Bcast(). I am
working in corporate environment and suspect it may have something to do with
our security policies, however, I don't know exact which low-level operations
failed . </FONT></SPAN></DIV>
<DIV><SPAN class=EC_EC_618441420-17082007></SPAN> </DIV>
<DIV><SPAN class=EC_EC_618441420-17082007><FONT face=Arial>Here is the
symptom: my application (as well as cpi.exe example) works fine as long as
there is only one machine in the machine file, whether its local machine or
remote does not matter. It hangs at MPI_Bcast() when I have more than one
machine in MPI_COMM_WORLD. </FONT></SPAN><FONT face=Arial><SPAN
class=EC_EC_618441420-17082007><FONT face=Arial>I am using </FONT></SPAN><FONT
face=Arial>mpich2-1.0.5p2-win32-ia32.msi.</FONT></FONT></DIV>
<DIV><SPAN class=EC_EC_618441420-17082007></SPAN> </DIV>
<DIV><SPAN class=EC_EC_618441420-17082007><FONT face=Arial>The same
application worked perfectly a year ago and there have been many security
policy changes since that time(as usual, all policies reduce our freedom). My
question is that what's the communication mechanism used in inter-node
communication. I tried nothing, auto, sock, ssm as communication channels and
had no luck.</FONT></SPAN></DIV>
<DIV><SPAN class=EC_EC_618441420-17082007></SPAN> </DIV>
<DIV><SPAN class=EC_EC_618441420-17082007><FONT face=Arial>Thanks for your
help.<BR><BR>Richard<BR></FONT></SPAN></DIV></FONT></DIV><BR>
<HR>
Discover the new Windows Vista <A
href="http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE"
target=_blank>Learn more!</A> </BLOCKQUOTE><BR>
<HR>
Make your little one a shining star! <A
href="http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us" target=_new>Shine
on!</A> </BODY></HTML>