[MPICH2-dev] run cpi on windows and linux machines
Yu-Cheng Chou
cycchou at ucdavis.edu
Fri Oct 7 18:31:47 CDT 2005
Hi,
I launched smpd on 4 machines which include 2 windows and 2 linux
platforms.
I executed the command shown below on the linux local host.
mpiexec -pwdfile password -path "/home/ycchou/tmp;C:\tmp" -n 4 -
machinefile machines cpi
(1)It hung there.
The file "machines" contains the machine names which are placed in
this order - linux A (local host), linux B, windows A, and windows B.
(2)I changed the order of the machine names like this - linux A (local
host), windows A, linux B, and windows B.
The program exited after I entered the number of intervals.
The error message is shown below.
job aborted:
rank: node: exit code[: error message]
0: iel2: -1
1: bird2: 13
2: Mouse2: -1: Fatal error in MPI_Bcast: Other MPI error, error stack:
MPI_Bcast(827): MPI_Bcast(buf=0xbffff748, count=1, MPI_INT, root=0,
MPI_COMM_WORLD) failed
MPIR_Bcast(229):
MPIC_Send(48):
MPIC_Wait(321):
MPIDI_CH3_Progress_wait(209): an error occurred while handling an
event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(1120): [ch3:sock] failed to
connnect to remote process 234207FE36DCB2F24D062016BBEF915:3
MPIDU_Sock_post_connect(114): unable to resolve host name to an
address (set=1,sock=16777218,host=phoenix)
3: phoenix: 13
(3)When I put the windows machines in the first two positions of the
file "machines", the program worked.
(4)When I changed the order of the machine names like this - linux B,
linux A (local host), windows A, and windows B, the program exited
with error message shown below.
op_read error on left context: socket connection closed, error stack:
MPIDU_Socki_handle_read(632): connection closed by peer
(set=1,sock=16777216)
unable to read the cmd header on the left context, socket connection
closed, error stack:
MPIDU_Socki_handle_read(632): connection closed by peer
(set=1,sock=16777216).
(5)When I changed the order of the machine names like this - linux B,
linux A (local host), windows B, and windows A, the program hung there
with error message shown below.
op_read error on left context: socket connection failed, error stack:
MPIDU_Socki_handle_read(658): connection failure
(set=1,sock=16777216,errno=104:Connection reset by peer)
unable to read the cmd header on the left context, socket connection
failed, error stack:
MPIDU_Socki_handle_read(658): connection failure
(set=1,sock=16777216,errno=104:Connection reset by peer).
Could anyone tell me why it works only when windows machines are placed
before linux machines in the machinefile?
More information about the mpich2-dev
mailing list