[MPICH2-dev] run cpi on windows and linux machines
    Yu-Cheng Chou 
    cycchou at ucdavis.edu
       
    Fri Oct  7 18:31:47 CDT 2005
    
    
  
Hi,
I launched smpd on 4 machines which include 2 windows and 2 linux 
platforms.
I executed the command shown below on the linux local host.
mpiexec -pwdfile password -path "/home/ycchou/tmp;C:\tmp" -n 4 -
machinefile machines cpi
(1)It hung there. 
   The file "machines" contains the machine names which are placed in 
   this order - linux A (local host), linux B, windows A, and windows B.
(2)I changed the order of the machine names like this - linux A (local 
   host), windows A, linux B, and windows B.
   The program exited after I entered the number of intervals.
   The error message is shown below.
 
    job aborted:
    rank: node: exit code[: error message]
    0: iel2: -1
    1: bird2: 13
    2: Mouse2: -1: Fatal error in MPI_Bcast: Other MPI error, error stack:
    MPI_Bcast(827): MPI_Bcast(buf=0xbffff748, count=1, MPI_INT, root=0,
    MPI_COMM_WORLD) failed
    MPIR_Bcast(229): 
    MPIC_Send(48): 
    MPIC_Wait(321): 
    MPIDI_CH3_Progress_wait(209): an error occurred while handling an
    event returned by MPIDU_Sock_Wait()
    MPIDI_CH3I_Progress_handle_sock_event(1120): [ch3:sock] failed to 
    connnect to remote process 234207FE36DCB2F24D062016BBEF915:3
    MPIDU_Sock_post_connect(114): unable to resolve host name to an
    address (set=1,sock=16777218,host=phoenix)
    3: phoenix: 13
(3)When I put the windows machines in the first two positions of the 
   file "machines", the program worked.
(4)When I changed the order of the machine names like this - linux B,
   linux A (local host), windows A, and windows B, the program exited 
   with error message shown below.
   op_read error on left context: socket connection closed, error stack:
   MPIDU_Socki_handle_read(632): connection closed by peer  
   (set=1,sock=16777216)
   unable to read the cmd header on the left context, socket connection
   closed, error stack:
   MPIDU_Socki_handle_read(632): connection closed by peer   
   (set=1,sock=16777216).
(5)When I changed the order of the machine names like this - linux B,
   linux A (local host), windows B, and windows A, the program hung there 
   with error message shown below.
   op_read error on left context: socket connection failed, error stack:
   MPIDU_Socki_handle_read(658): connection failure  
   (set=1,sock=16777216,errno=104:Connection reset by peer)
   unable to read the cmd header on the left context, socket connection
   failed, error stack:
   MPIDU_Socki_handle_read(658): connection failure
   (set=1,sock=16777216,errno=104:Connection reset by peer).
Could anyone tell me why it works only when windows machines are placed 
before linux machines in the machinefile?
    
    
More information about the mpich2-dev
mailing list