[mpich-discuss] mpiexec / Windows networking
Michael Colonno
mcolonno at stanford.edu
Tue Jun 19 23:22:04 CDT 2012
Update on this issue: it seems to have to do with the paths of the remote processes. I have the bin dir in my path. Without any absolute paths, I get this:
>mpiexec -n 4 cxxpi.exe
Error while connecting to host, No connection could be made because the target machine actively refused it. (10061)
Connect on sock (host=mike-studio17, port=8678) failed, exhausted all end points
Unable to connect to 'mike-studio17:8678',
sock error: Error = -1
Adding the absolute path on mpiexec.exe fixes this issue:
>C:\Users\mcolonno\Desktop\Software\MPICH2\bin\mpiexec -n 4 cxxpi.exe
Process 2 of 4 is on mike-studio17
Process 3 of 4 is on mike-studio17
Process 0 of 4 is on mike-studio17
Process 1 of 4 is on mike-studio17
pi is approximately 3.14159 Error is 8.33331e-010
wall clock time = 0.000230972
Absolute path on cxxpi.exe has no effect:
>mpiexec -n 4 C:\Users\mcolonno\Desktop\Software\MPICH2\bin\cxxpi.exe
Error while connecting to host, No connection could be made because the target machine actively refused it. (10061)
Connect on sock (host=mike-studio17, port=8678) failed, exhausted all end points
Unable to connect to 'mike-studio17:8678',
sock error: Error = -1
Note the status of smpd doesn't seem to matter to function:
>smpd -status
no smpd running on mike-studio17.bsc.corp
Interestingly, running hostname works fine:
>C:\Users\mcolonno\Desktop\Software\MPICH2\bin\mpiexec -n 4 hostname
mike-studio17
mike-studio17
mike-studio17
mike-studio17
Other basic system commands throw an error:
>C:\Users\mcolonno\Desktop\Software\MPICH2\bin\mpiexec -n 4 chdir
launch failed: CreateProcess(chdir) on 'mike-studio17' failed, error 2 - The system cannot find the file specified.
launch failed: CreateProcess(chdir) on 'mike-studio17' failed, error 2 - The system cannot find the file specified.
launch failed: CreateProcess(chdir) on 'mike-studio17' failed, error 2 - The system cannot find the file specified.
launch failed: CreateProcess(chdir) on 'mike-studio17' failed, error 2 - The system cannot find the file specified.
Error posting writev, An established connection was aborted by the software in your host machine.(10053)
unable to post a write for the next command,
sock error: Error = 10053
unable to post a write of the close command to tear down the job tree as part of the abort process.
unable to post an abort command.
I'm used to the Linux setup in which one's config file would take care of any path issues; here I'm not sure what to do beyond putting the binaries in my path. Any advice on Windows operation is appreciated.
Thanks,
~Mike C.
-----Original Message-----
From: Michael Colonno [mailto:mcolonno at stanford.edu]
Sent: Tuesday, June 19, 2012 2:47 PM
To: 'mpich-discuss at mcs.anl.gov'
Cc: 'Jayesh Krishna'
Subject: RE: mpiexec / Windows networking
Looks like a spoke too soon: I'm back into the mode where I can run anything through the wmpiexec GUI but anything on the command line does not show SMPD running and can't launch jobs. Windows services shows the service as running normally. What are the differences between general command line usage and this GUI tool? why would one recognize the running service and another not? I should have known better than to see it work without knowing why the status changed...
Thanks,
~Mike C.
-----Original Message-----
From: Jayesh Krishna <mailto:[mailto:jayesh at mcs.anl.gov]> [mailto:jayesh at mcs.anl.gov]
Sent: Tuesday, June 19, 2012 8:19 AM
To: Michael Colonno
Cc: <mailto:mpich-discuss at mcs.anl.gov> mpich-discuss at mcs.anl.gov
Subject: Re: mpiexec / Windows networking
Hi,
Great, let us know if you have any further issues.
(PS: Yeah, I meant ignore the message from wmpiconfig.) Regards, Jayesh
----- Original Message -----
From: "Michael Colonno" < <mailto:mcolonno at stanford.edu> mcolonno at stanford.edu>
To: "Jayesh Krishna" < <mailto:jayesh at mcs.anl.gov> jayesh at mcs.anl.gov>
Cc: <mailto:mpich-discuss at mcs.anl.gov> mpich-discuss at mcs.anl.gov
Sent: Tuesday, June 19, 2012 9:54:11 AM
Subject: RE: mpiexec / Windows networking
Hi Jayesh ~
Sorry - perhaps I wasn't clear: everything was working through wmpiexec but not working on the command line (behaved like MPICH2 was installed but services were not running). (You may have meant ignore the error message from wmpiconfig below.) However, after closing my command prompts and reopening them (after the successful test through wmpiexec) they now seem to behave perfectly. I can't say I have a good explanation for this but I am glad everything is operational. I will chime in again if I have any more difficulty.
Thanks for all the help,
~Mike C.
-----Original Message-----
From: Jayesh Krishna <mailto:[mailto:jayesh at mcs.anl.gov]> [mailto:jayesh at mcs.anl.gov]
Sent: Tuesday, June 19, 2012 7:41 AM
To: Michael Colonno
Cc: <mailto:mpich-discuss at mcs.anl.gov> mpich-discuss at mcs.anl.gov
Subject: Re: mpiexec / Windows networking
Hi,
Ignore the error message from wmpiexec. There is a known bug that causes it to report inaccurate status (does not work as expected).
Can you run an MPI job from the command line?
Regards,
Jayesh
----- Original Message -----
From: "Michael Colonno" < <mailto:mcolonno at stanford.edu> mcolonno at stanford.edu>
To: <mailto:mpich-discuss at mcs.anl.gov> mpich-discuss at mcs.anl.gov
Cc: "Jayesh Krishna" < <mailto:jayesh at mcs.anl.gov> jayesh at mcs.anl.gov>
Sent: Tuesday, June 19, 2012 9:33:26 AM
Subject: RE: mpiexec / Windows networking
The instructions referenced below were followed to install MPICH2 on the Windows 7 system (uninstalled / reinstalled twice with the same results). Is there any reason the wmpiexec and command line behavior would be different? Perhaps some system-wide post-install setting?
Thanks,
~Mike C.
-----Original Message-----
From: Jayesh Krishna <mailto:[mailto:jayesh at mcs.anl.gov]> [mailto:jayesh at mcs.anl.gov]
Sent: Tuesday, June 19, 2012 7:22 AM
To: <mailto:mpich-discuss at mcs.anl.gov> mpich-discuss at mcs.anl.gov
Cc: Michael Colonno
Subject: Re: mpiexec / Windows networking
Hi,
The earlier error message (smpd error message) indicates that MPICH2 was not installed correctly on your system. I would recommend the following,
# Uninstall MPICH2 from the system
# Follow instructions in Section 9.4 (NOT 9.1) of the MPICH2 installer's guide (available at <http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=docs> http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=docs) to install MPICH2.
Regards,
Jayesh
----- Original Message -----
From: "Michael Colonno" < <mailto:mcolonno at stanford.edu> mcolonno at stanford.edu>
To: <mailto:mpich-discuss at mcs.anl.gov> mpich-discuss at mcs.anl.gov
Cc: "Jayesh Krishna" < <mailto:jayesh at mcs.anl.gov> jayesh at mcs.anl.gov>
Sent: Monday, June 18, 2012 6:15:07 PM
Subject: RE: mpiexec / Windows networking
Follow up: I can run the example successfully through the GUI wrapper (wmpiexec.exe) but not from the command line through a console, which seems odd. So the installation works and the relevant service must be running but it doesn’t seem the command line environment can communicate with it. Besides setting the path, is there anything else I can do on this front?
Thanks,
~Mike C.
From: Michael Colonno <mailto:[mailto:mcolonno at stanford.edu]> [mailto:mcolonno at stanford.edu]
Sent: Monday, June 18, 2012 4:06 PM
To: 'mpich-discuss at mcs.anl.gov'
Cc: 'Jayesh Krishna'
Subject: mpiexec / Windows networking
Trying to run the cxxpi.exe example program and I'm hitting a roadblock (seems others have shared this as well) on a Windows 7 x64 system. I have followed the instructions summarized in: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/2011-April/009694.html> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2011-April/009694.html . Checking services, the "MPICH2 Process Manager" is started (I restarted it to confirm). However, checking the status of SMPD:
>smpd -status
no smpd running on mike-studio17
Trying to run a test produces the error in the thread above:
>mpiexec -n 2 hostname
Error while connecting to host, No connection could be made because the target machine actively refused it. (10061)
Connect on sock (host=mike-studio17, port=8678) failed, exhausted all end points
Unable to connect to 'mike-studio17:8678',
sock error: Error = -1
In the menu of wmpiconfig.exe under "error" it says " mike-studio17: MPICH2 not installed or unable to query the host ". I didn’t install to the default path, but other than that there is nothing extraordinary (bin directory added to path of course). If I scan hosts for versions, the wmpiconfig tool does detect the correct version on this host. Using “scan hosts”, I get “ Error: No servers available for this domain ”. It seems like the service simultaneously is and is not running. Anything I can do to debug?
Thanks,
~Mike C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120619/d8519a2e/attachment-0001.html>
More information about the mpich-discuss
mailing list