[MPICH] run cpi on windows and linux machines

David Ashton ashton at mcs.anl.gov
Sun Oct 9 16:07:49 CDT 2005


1) Can you try adding -plaintext to the mpiexec command?  When you mix Linux
and Windows machines you have to turn off encryption.

2) You need to make sure your hosts can resolve each other's IP addresses
from their host names.  Can mouse2 ping phoenix?  You may need to modify
your hosts file or point your machines to a DNS server that know all the
hosts' names.

Also make sure you don't have a firewall running that prevents connections
between the machines.

-David Ashton

-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Yu-Cheng Chou
Sent: Friday, October 07, 2005 5:41 PM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] run cpi on windows and linux machines


Hi,

I launched smpd on 4 machines which include 2 windows and 2 linux 
platforms.

I executed the command shown below on the linux local host.

mpiexec -pwdfile password -path "/home/ycchou/tmp;C:\tmp" -n 4 -
machinefile machines cpi

(1)It hung there. 
   The file "machines" contains the machine names which are placed in 
   this order - linux A (local host), linux B, windows A, and windows B.

(2)I changed the order of the machine names like this - linux A (local 
   host), windows A, linux B, and windows B.
   The program exited after I entered the number of intervals.
   The error message is shown below.
 
    job aborted:
    rank: node: exit code[: error message]
    0: iel2: -1
    1: bird2: 13
    2: Mouse2: -1: Fatal error in MPI_Bcast: Other MPI error, error stack:
    MPI_Bcast(827): MPI_Bcast(buf=0xbffff748, count=1, MPI_INT, root=0,
    MPI_COMM_WORLD) failed
    MPIR_Bcast(229): 
    MPIC_Send(48): 
    MPIC_Wait(321): 
    MPIDI_CH3_Progress_wait(209): an error occurred while handling an
    event returned by MPIDU_Sock_Wait()
    MPIDI_CH3I_Progress_handle_sock_event(1120): [ch3:sock] failed to 
    connnect to remote process 234207FE36DCB2F24D062016BBEF915:3
    MPIDU_Sock_post_connect(114): unable to resolve host name to an
    address (set=1,sock=16777218,host=phoenix)
    3: phoenix: 13

(3)When I put the windows machines in the first two positions of the 
   file "machines", the program worked.

(4)When I changed the order of the machine names like this - linux B,
   linux A (local host), windows A, and windows B, the program exited 
   with error message shown below.

   op_read error on left context: socket connection closed, error stack:
   MPIDU_Socki_handle_read(632): connection closed by peer  
   (set=1,sock=16777216)
   unable to read the cmd header on the left context, socket connection
   closed, error stack:
   MPIDU_Socki_handle_read(632): connection closed by peer   
   (set=1,sock=16777216).

(5)When I changed the order of the machine names like this - linux B,
   linux A (local host), windows B, and windows A, the program hung there 
   with error message shown below.

   op_read error on left context: socket connection failed, error stack:
   MPIDU_Socki_handle_read(658): connection failure  
   (set=1,sock=16777216,errno=104:Connection reset by peer)
   unable to read the cmd header on the left context, socket connection
   failed, error stack:
   MPIDU_Socki_handle_read(658): connection failure
   (set=1,sock=16777216,errno=104:Connection reset by peer).

Could anyone tell me why it works only when windows machines are placed 
before linux machines in the machinefile?






More information about the mpich-discuss mailing list