[mpich-discuss] mpiexec / Windows networking

Jayesh Krishna jayesh at mcs.anl.gov
Wed Jun 20 09:51:59 CDT 2012


Hi,
 Do you have any other MPI libraries installed in your system (In particular, do you have Microsoft MPI)? If so, you need to check your paths to make sure that you are using the correct executable (smpd and mpiexec from MPICH2 not MSMPI).

Regards,
Jayesh

----- Original Message -----
From: "Michael Colonno" <mcolonno at stanford.edu>
To: mpich-discuss at mcs.anl.gov
Cc: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Sent: Tuesday, June 19, 2012 11:22:04 PM
Subject: RE: mpiexec / Windows networking




Update on this issue: it seems to have to do with the paths of the remote processes. I have the bin dir in my path. Without any absolute paths, I get this: 



>mpiexec -n 4 cxxpi.exe 

Error while connecting to host, No connection could be made because the target machine actively refused it. (10061) 

Connect on sock (host=mike-studio17, port=8678) failed, exhausted all end points 

Unable to connect to 'mike-studio17:8678', 

sock error: Error = -1 



Adding the absolute path on mpiexec.exe fixes this issue: 



>C:\Users\mcolonno\Desktop\Software\MPICH2\bin\mpiexec -n 4 cxxpi.exe 

Process 2 of 4 is on mike-studio17 

Process 3 of 4 is on mike-studio17 

Process 0 of 4 is on mike-studio17 

Process 1 of 4 is on mike-studio17 

pi is approximately 3.14159 Error is 8.33331e-010 

wall clock time = 0.000230972 



Absolute path on cxxpi.exe has no effect: 



>mpiexec -n 4 C:\Users\mcolonno\Desktop\Software\MPICH2\bin\cxxpi.exe 

Error while connecting to host, No connection could be made because the target machine actively refused it. (10061) 

Connect on sock (host=mike-studio17, port=8678) failed, exhausted all end points 

Unable to connect to 'mike-studio17:8678', 

sock error: Error = -1 



Note the status of smpd doesn't seem to matter to function: 



>smpd -status 

no smpd running on mike-studio17.bsc.corp 



Interestingly, running hostname works fine: 



>C:\Users\mcolonno\Desktop\Software\MPICH2\bin\mpiexec -n 4 hostname 

mike-studio17 

mike-studio17 

mike-studio17 

mike-studio17 



Other basic system commands throw an error: 



>C:\Users\mcolonno\Desktop\Software\MPICH2\bin\mpiexec -n 4 chdir 

launch failed: CreateProcess(chdir) on 'mike-studio17' failed, error 2 - The system cannot find the file specified. 



launch failed: CreateProcess(chdir) on 'mike-studio17' failed, error 2 - The system cannot find the file specified. 



launch failed: CreateProcess(chdir) on 'mike-studio17' failed, error 2 - The system cannot find the file specified. 



launch failed: CreateProcess(chdir) on 'mike-studio17' failed, error 2 - The system cannot find the file specified. 



Error posting writev, An established connection was aborted by the software in your host machine.(10053) 

unable to post a write for the next command, 

sock error: Error = 10053 



unable to post a write of the close command to tear down the job tree as part of the abort process. 

unable to post an abort command. 



I'm used to the Linux setup in which one's config file would take care of any path issues; here I'm not sure what to do beyond putting the binaries in my path. Any advice on Windows operation is appreciated. 



Thanks, 

~Mike C. 



-----Original Message----- 
From: Michael Colonno [mailto:mcolonno at stanford.edu] 
Sent: Tuesday, June 19, 2012 2:47 PM 
To: 'mpich-discuss at mcs.anl.gov' 
Cc: 'Jayesh Krishna' 
Subject: RE: mpiexec / Windows networking 



Looks like a spoke too soon: I'm back into the mode where I can run anything through the wmpiexec GUI but anything on the command line does not show SMPD running and can't launch jobs. Windows services shows the service as running normally. What are the differences between general command line usage and this GUI tool? why would one recognize the running service and another not? I should have known better than to see it work without knowing why the status changed... 



Thanks, 

~Mike C. 



-----Original Message----- 

From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov] 

Sent: Tuesday, June 19, 2012 8:19 AM 

To: Michael Colonno 

Cc: mpich-discuss at mcs.anl.gov 

Subject: Re: mpiexec / Windows networking 



Hi, 

Great, let us know if you have any further issues. 



(PS: Yeah, I meant ignore the message from wmpiconfig.) Regards, Jayesh 



----- Original Message ----- 

From: "Michael Colonno" < mcolonno at stanford.edu > 

To: "Jayesh Krishna" < jayesh at mcs.anl.gov > 

Cc: mpich-discuss at mcs.anl.gov 

Sent: Tuesday, June 19, 2012 9:54:11 AM 

Subject: RE: mpiexec / Windows networking 



Hi Jayesh ~ 



Sorry - perhaps I wasn't clear: everything was working through wmpiexec but not working on the command line (behaved like MPICH2 was installed but services were not running). (You may have meant ignore the error message from wmpiconfig below.) However, after closing my command prompts and reopening them (after the successful test through wmpiexec) they now seem to behave perfectly. I can't say I have a good explanation for this but I am glad everything is operational. I will chime in again if I have any more difficulty. 



Thanks for all the help, 

~Mike C. 



-----Original Message----- 

From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov] 

Sent: Tuesday, June 19, 2012 7:41 AM 

To: Michael Colonno 

Cc: mpich-discuss at mcs.anl.gov 

Subject: Re: mpiexec / Windows networking 



Hi, 

Ignore the error message from wmpiexec. There is a known bug that causes it to report inaccurate status (does not work as expected). 

Can you run an MPI job from the command line? 



Regards, 

Jayesh 



----- Original Message ----- 

From: "Michael Colonno" < mcolonno at stanford.edu > 

To: mpich-discuss at mcs.anl.gov 

Cc: "Jayesh Krishna" < jayesh at mcs.anl.gov > 

Sent: Tuesday, June 19, 2012 9:33:26 AM 

Subject: RE: mpiexec / Windows networking 



The instructions referenced below were followed to install MPICH2 on the Windows 7 system (uninstalled / reinstalled twice with the same results). Is there any reason the wmpiexec and command line behavior would be different? Perhaps some system-wide post-install setting? 



Thanks, 

~Mike C. 



-----Original Message----- 

From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov] 

Sent: Tuesday, June 19, 2012 7:22 AM 

To: mpich-discuss at mcs.anl.gov 

Cc: Michael Colonno 

Subject: Re: mpiexec / Windows networking 



Hi, 

The earlier error message (smpd error message) indicates that MPICH2 was not installed correctly on your system. I would recommend the following, 



# Uninstall MPICH2 from the system 

# Follow instructions in Section 9.4 (NOT 9.1) of the MPICH2 installer's guide (available at http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=docs ) to install MPICH2. 



Regards, 

Jayesh 



----- Original Message ----- 

From: "Michael Colonno" < mcolonno at stanford.edu > 

To: mpich-discuss at mcs.anl.gov 

Cc: "Jayesh Krishna" < jayesh at mcs.anl.gov > 

Sent: Monday, June 18, 2012 6:15:07 PM 

Subject: RE: mpiexec / Windows networking 









Follow up: I can run the example successfully through the GUI wrapper (wmpiexec.exe) but not from the command line through a console, which seems odd. So the installation works and the relevant service must be running but it doesn’t seem the command line environment can communicate with it. Besides setting the path, is there anything else I can do on this front? 







Thanks, 



~Mike C. 











From: Michael Colonno [mailto:mcolonno at stanford.edu] 

Sent: Monday, June 18, 2012 4:06 PM 

To: 'mpich-discuss at mcs.anl.gov' 

Cc: 'Jayesh Krishna' 

Subject: mpiexec / Windows networking 







Trying to run the cxxpi.exe example program and I'm hitting a roadblock (seems others have shared this as well) on a Windows 7 x64 system. I have followed the instructions summarized in: http://lists.mcs.anl.gov/pipermail/mpich-discuss/2011-April/009694.html . Checking services, the "MPICH2 Process Manager" is started (I restarted it to confirm). However, checking the status of SMPD: 







>smpd -status 



no smpd running on mike-studio17 







Trying to run a test produces the error in the thread above: 







>mpiexec -n 2 hostname 



Error while connecting to host, No connection could be made because the target machine actively refused it. (10061) 



Connect on sock (host=mike-studio17, port=8678) failed, exhausted all end points 



Unable to connect to 'mike-studio17:8678', 



sock error: Error = -1 







In the menu of wmpiconfig.exe under "error" it says " mike-studio17: MPICH2 not installed or unable to query the host ". I didn’t install to the default path, but other than that there is nothing extraordinary (bin directory added to path of course). If I scan hosts for versions, the wmpiconfig tool does detect the correct version on this host. Using “scan hosts”, I get “ Error: No servers available for this domain ”. It seems like the service simultaneously is and is not running. Anything I can do to debug? 







Thanks, 



~Mike C. 







More information about the mpich-discuss mailing list