[mpich-discuss] MPICH2 Between Linux/Windows

jayesh at mcs.anl.gov jayesh at mcs.anl.gov
Tue Aug 24 10:22:06 CDT 2010


Hi,

>> the ip adress 192.168.1.122 ist correct, but the same error occurs.
 Do you mean the ip address IS correct or ISN'T correct ?

 Can you ping from one machine to the other using the ipaddresses used in the machinefile (windows m/c to linux m/c & linux m/c to windows m/c)?
 Do you have a DNS server in your network ?
 What did you add to /etc/hosts ?

Regards,
Jayesh
----- Original Message -----
From: "Stephan Hackstedt" <stephan.hackstedt at googlemail.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Sent: Tuesday, August 24, 2010 1:35:34 AM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] MPICH2 Between Linux/Windows

I edited my /etc/hosts, seems not to work. 

############################ 
job aborted: 
rank: node: exit code[: error message] 
0: 192.168.1.33 : 1: Fatal error in PMPI_Barrier: Other MPI error, error stack: 
PMPI_Barrier(476)............: MPI_Barrier(MPI_COMM_WORLD) failed 
MPIR_Barrier(82).............: 
MPIC_Sendrecv(158)...........: 
MPID_Isend(116)..............: failure occurred while attempting to send an eager message 
MPIDI_CH3_iSend(175).........: 
MPIDI_CH3I_Sock_connect(1212): [ch3:sock] rank 0 unable to connect to rank 1 usi 
ng business card <port=57547 description=stephan-desktop ifname= 192.168.1.122 > 
MPIDU_Sock_post_connect(1231): unable to connect to stephan-desktop on port 57547, exhausted all endpoints (errno -1) 
MPIDU_Sock_post_connect(1247): gethostbyname failed, Der angegebene Host ist unbekannt. (errno 11001) 
1: 192.168.1.122 : -2 
############################ 

the ip adress 192.168.1.122 ist correct, but the same error occurs. 



2010/8/24 Stephan Hackstedt < stephan.hackstedt at googlemail.com > 


I replaced MPI_Bcast with MPI_Barrier for testing purpose, MPI_Barrier creates the following error: 

############################ 
job aborted: 
rank: node: exit code[: error message] 
0: 192.168.1.33 : 1: Fatal error in PMPI_Barrier: Other MPI error, error stack: 
PMPI_Barrier(476)............: MPI_Barrier(MPI_COMM_WORLD) failed 
MPIR_Barrier(82).............: 
MPIC_Sendrecv(158)...........: 
MPID_Isend(116)..............: failure occurred while attempting to send an eager message 
MPIDI_CH3_iSend(175).........: 
MPIDI_CH3I_Sock_connect(1212): [ch3:sock] rank 0 unable to connect to rank 1 usi 
ng business card <port=39459 description=stephan-desktop ifname=127.0.1.1 > 
MPIDU_Sock_post_connect(1231): unable to connect to stephan-desktop on port 39459, exhausted all endpoints (errno -1) 
MPIDU_Sock_post_connect(1247): gethostbyname failed, Der angegebene Host ist unb 
ekannt. (errno 11001) 
1: 192.168.1.122 : -2 
############################ 

Maybe, the problem is, that the Windows machine (rank 0) tries to connect to "port=39459 description=stephan-desktop ifname=127.0.1.1" , but this shouldn't the local ip adress of the unix machine, right? 

regards, 
stephan 


2010/8/24 Stephan Hackstedt < stephan.hackstedt at googlemail.com > 





Hi, 


# Did you compile the same test case on both the machines (There is icpi.c ==> interactive version and cpi.c ==> non-interactive version) ? 
Yes 

# Did you specify the ipaddresses of the machines in the machinefile (If you are specifying the hostnames in the machinefile try specifying ipaddresses instead)? 
Yes, tried that. 

So when the communication does not work, how ist it possible that die output is forwarded to the windows machine, so that i can see the output from the linux machine on the windows commandline? 

regards, 
stephan 


2010/8/23 Jayesh Krishna < jayesh at mcs.anl.gov > 





Hi, 
>From the results it looks like the linux machine is unable to communicate using the sock channel (connect via tcp) with the windows machine. 

# Did you compile the same test case on both the machines (There is icpi.c ==> interactive version and cpi.c ==> non-interactive version) ? 
# Did you specify the ipaddresses of the machines in the machinefile (If you are specifying the hostnames in the machinefile try specifying ipaddresses instead)? 


Regards, 
Jayesh 

----- Original Message ----- 
From: "Stephan Hackstedt" < stephan.hackstedt at googlemail.com > 

To: jayesh at mcs.anl.gov 
Sent: Monday, August 23, 2010 4:11:17 PM GMT -06:00 US/Canada Central 
Subject: Re: [mpich-discuss] MPICH2 Between Linux/Windows 


Hi, 

thanks for your advises. 

# Turn off any firewalls (windows firewall, linux firewall) running on the machines 
->done 
# Configure MPICH2 on linux with the sock channel (./configure ... --with-pm=smpd --with-device=ch3:sock ...) & SMPD process manager & install it. 
->done -> ./configure --prefix=.../project/mpich2_121 --with-pm=smpd --with-device=ch3:sock 



# Install the same version of MPICH2 (Same architecture & MPICH2 version) on the windows system. 
->done i used mpich2-1.2.1p1-win-ia32.msi installer 
# Make sure that you have the same user on both the machines (Either the same username/password or Use a domain user). Register that user on the windows machine (mpiexec -register). 
->done 
# Try running MPI jobs locally on the windows machine and the linux machine 
->works! 
# Try pinging each machine from the other machine 
->works! 
# Now try running a non-MPI job from the windows machine (mpiexec -n 2 -machinefile mf.txt -channel sock hostname; where mf.txt contains the ipaddresses of both the machines in separate lines) 
->works, output both processes is plotted in windows terminal, programms successful end. 
# If the above step is successful, try running cpi ( https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/examples/icpi.c ) on both the machines. eg: If you have icpi.exe in c:\test on the windows machine and icpi.exe (unix executable) on /home/stephan/test , you should specify the paths as below, 

->the job hangs directly after the input of the number interval, even if I choose 0 for quit, I also did not get the error message. 
>Enter the number of intervals: (0 quits) 1 
>_ 
>Enter the number of intervals: (0 quits) 0 
>_ 

I also tried to use the cpi example from the source, cpi.c. I debugged, and it as far as I can say, the error occurs exactly when calling MPI_Bcast, the programm blocks and waits until I cancel it manually. Maybe the Data transmission does not work correctly in my case? i don't know what can cause this. 

The output is: 

>Process 0 of 2 is on stephanxp <- WinXP Machine 
>Process 1 of 2 is on stephan-desktop <- Linux Machine 
>_ 


regards, 
stephan 


2010/8/23 < jayesh at mcs.anl.gov > 


Hi, 
Did you try explicitly specifying the channel on windows (mpiexec -n 2 -channel sock -machinefile mf.txt hostname) ? 
Please try the following steps (Even if you have tried it before, pls try the steps as specified below - this helps us in debugging your problem) and let us know the results, 

# Turn off any firewalls (windows firewall, linux firewall) running on the machines 
# Configure MPICH2 on linux with the sock channel (./configure ... --with-pm=smpd --with-device=ch3:sock ...) & SMPD process manager & install it. 
# Install the same version of MPICH2 (Same architecture & MPICH2 version) on the windows system. 
# Make sure that you have the same user on both the machines (Either the same username/password or Use a domain user). Register that user on the windows machine (mpiexec -register). 
# Try running MPI jobs locally on the windows machine and the linux machine 
# Try pinging each machine from the other machine 
# Now try running a non-MPI job from the windows machine (mpiexec -n 2 -machinefile mf.txt -channel sock hostname; where mf.txt contains the ipaddresses of both the machines in separate lines) 
# If the above step is successful, try running cpi ( https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/examples/icpi.c ) on both the machines. eg: If you have icpi.exe in c:\test on the windows machine and icpi.exe (unix executable) on /home/stephan/test , you should specify the paths as below, 

mpiexec -n 2 -machinefile mf.txt -channel sock -path "c:\test;/home/stephan/test" icpi.exe 

Let us know the results (Pls provide as much details as possible - including commands typed and the stdout/err outputs). 

Regards, 
Jayesh 




----- Original Message ----- 
From: "Stephan Hackstedt" < stephan.hackstedt at googlemail.com > 
To: mpich-discuss at mcs.anl.gov 
Sent: Monday, August 23, 2010 1:19:24 PM GMT -06:00 US/Canada Central 
Subject: Re: [mpich-discuss] MPICH2 Between Linux/Windows 


i also observed, that when i use the ch3:sock build, the MPI_comm_open always repeats the internal ip adress like: 

tag=0 port=46264 description=stephan-desktop ifname=127.0.1.1 


regards, 
stephan 


2010/8/23 Stephan Hackstedt < stephan.hackstedt at googlemail.com > 


Hi, 

I'm trying to establish a connection between a process on linux (ububtu 10 04) and a process on a WinXp machine. MPICH 1.2.1 on both machines. The WinXP with installer, ubuntu from sources. 
I configured the linux side with 

./configure --with-pm=smpd --with-device=ch3:nemesis 

and another build with 

./configure --with-pm=smpd --with-device=ch3:sock , 

none of them work. I also disabled all firewalls, wonder because when i start a virtual machine on the winxo machine i can esatblish a connection to the linux machine. Had anybody dealt with such problems before? 
One other question is, do i have to rebuild a MPI Application when using another MPICH2 build with another processmanager like above? 

regards, 

stephan 




_______________________________________________ 
mpich-discuss mailing list 
mpich-discuss at mcs.anl.gov 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 






More information about the mpich-discuss mailing list