[MPICH] MPI_Bcast hangs in Windows XP
Richard Li
xs_li at hotmail.com
Fri Sep 7 21:00:19 CDT 2007
Jayesh,
Thanks a lot for your info.
After a lot of trys, I was finally able to run cpi.exe across multiple hosts. It seems to have something to with the setting in my machine file. The following is the detail.
I use the following command to run cpi.exe:
mpiexec -n 2 -machinefile config.txt -channel ssm(or others) cpi.exe
a) If I have a config.txt file like the following:
host1name:1 -ifhn host1_ipaddress
host2name:2 -ifhn host2_ipaddress
Everything works fine(for all channels).
b) If I have a config.txt like the following:
host1name:1
host2:name:2
then, for sock channel, it hangs mpi_bcast. For auto and ssm, I got the following error message:
C:\public\bin>mpiexec -n 2 -machinefile
Config.txt -channel ssm or auto
C:\public\bin\cpi.exe
Enter the number of intervals: (0 quits)
100
job aborted:
rank: node: exit code[: error
message]
0: B0016350B383E: 1: Fatal error in MPI_Bcast: Other MPI error,
error stack:
MPI_Bcast(784).................: MPI_Bcast(buf=0012FE88,
count=1, MPI_INT, root=0, MPI_COMM_WORLD)
fai
led
MPIR_Bcast(230)................:
MPIC_Send(36)..................:
MPIDI_EagerContigSend(146).....:
failure occurred while attempting to send an eager
message
MPIDI_CH3_iStartMsgv(224)......:
MPIDI_CH3I_VC_post_connect(555):
[ch3:sock] rank 0 unable to connect to rank 1 using business card
<po
rt=3872 description=B001279FD7C60.corp.bankofamerica.com
ifname=171.188.32.154 shm_host=B001279FD7C60.
corp.bankofamerica.com
shm_queue=39E4F281-FCC0-4f4a-B540-EDC8D517F065 shm_pid=2484
>
MPIDU_Sock_post_connect(1228)..: unable to connect to
B001279FD7C60.corp.bankofamerica.com on port 387
2, exhausted all endpoints
(errno -1)
MPIDU_Sock_post_connect(1244)..: gethostbyname failed, The
requested name is valid and was found in th
e database, but it does not have
the correct associated data being resolved for. (errno 11004)
1:
B001279FD7C60: 1
I know this has something to do with my network setting, but just can't figure out why.
Any ideas?
Thanks
Richard
From: jayesh at mcs.anl.gov
To: xs_li at hotmail.com
CC: mpich-discuss at mcs.anl.gov
Subject: RE: [MPICH] MPI_Bcast hangs in Windows XP
Date: Thu, 6 Sep 2007 09:25:36 -0500
Hi,
The process manager (smpd) is responsible for
launching the MPI processes on the various machines and providing an MPI
processes information on how to communicate with other MPI
processes.
The SMPD process manager listens (default
case) on port 8676 and then asks the client PM to connect to a new port. So
you should allow SMPD process manager (smpd.exe --- installed as a service in
windows) to communicate at all ports (This is the easiest way. However you can
also restrict the port range used by SMPD. Refer to the windows devloper's guide
available at http://www-unix.mcs.anl.gov/mpi/mpich/ for
details.)
Make
sure that no firewall (1. Running on the individual machines 2. OR
on the network, filtering the traffic btw the machines) is preventing the
process managers & the MPI procs on the individual machines from
contacting each other.
(Note: Since
you do not know what changed in your network, it might help if you try analyzing
the network packets sent btw the machines using a packet sniffer like
Ethereal.)
Regards,
Jayesh
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Richard
Li
Sent: Wednesday, September 05, 2007 8:21 PM
To:
mpich-discuss at mcs.anl.gov
Subject: [MPICH] MPI_Bcast hangs in Windows
XP
Hi
there,
I am writing an
application in Windows XP/VC8 and am having problem with MPI_Bcast(). I am
working in corporate environment and suspect it may have something to do with
our security policies, however, I don't know exact which low-level operations
failed .
Here is the symptom: my
application (as well as cpi.exe example) works fine as long as there is only one
machine in the machine file, whether its local machine or remote does not
matter. It hangs at MPI_Bcast() when I have more than one machine in
MPI_COMM_WORLD. I am using
mpich2-1.0.5p2-win32-ia32.msi.
The same application
worked perfectly a year ago and there have been many security policy changes
since that time(as usual, all policies reduce our freedom). My question is that
what's the communication mechanism used in inter-node communication. I tried
nothing, auto, sock, ssm as communication channels and had no
luck.
Thanks for your
help.
Richard
Discover the new Windows Vista Learn more!
_________________________________________________________________
Capture your memories in an online journal!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070907/e00c5fbc/attachment.htm>
More information about the mpich-discuss
mailing list