[mpich-discuss] SMPD, Problem launching when using -host

James S Perrin james.s.perrin at manchester.ac.uk
Fri Oct 10 04:56:50 CDT 2008


Hi,

I have found the reason why my executable is failing to start, however I
  think -host is not behaving as it should or at least the documentation 
needs clarifying.

I guessed that using -host was somehow changing the executable's
environment and so it is failing to start correctly because it couldn't
find a dll.

On windows the PATH variable should be made up of the system wide
settings and the user specific additions:

ie echo %PATH% => <system settings>;<user settings>

The user settings are required to launch the process. When I launch as
follows:

mpiexec -localroot -n 1 master : -n 1 slave

both get the path setting as above, however if I use

mpiexec -localroot -n 1 master : -host roobarb -n 1 slave

process 1 has PATH=<system settings>;<user settings> but
process 2 has PATH=<system settings> only

I have no idea why the following works but it does, if I add -host
roobarb to the process 1 process 2 now gets the full PATH variable

mpiexec -localroot -host roobarb -n 1 master : -host roobarb -n 1 slave

Final permutation, if I now don't specify -localroot both processes only
get the only the system settings for PATH:

mpiexec -host roobarb -n 1 master : -host roobarb -n 1 slave

In summary using -host only the system path settings are used and not 
the user specific settings. Is this a security feature or a 
non-iteractive login issue c.f bash under linux the .bashrc is not 
executed for processes started remotely?

A little extra testing confirmed that when process gets both the system 
and user path settings it is getting this from the current cmd shell.

The solution is to either make sure paths are added to the system path 
variable or launch via a script that sets up the environment for each 
processes though I would have like to avoid this if possible. The first 
is a pain for development and the later a pain for user installations.

FYI I was examining the PATH variable using:

mpiexec -l -host roobarb -n 1 env : -host roobarb -n 1 env | grep \]PATH=

I have the UNIX commands env and grep in my PATH.

Regards
James

Jayesh Krishna wrote:
>  Hi,
>   Can you send us the debug output of mpiexec and smpd ? Please follow 
> the instructions below to send us the debug output,
> 
> # Stop any instances of smpd using the command, smpd -stop
> # Start smpd in the debug mode using the command, smpd -d
> # Run a non-MPI program with mpiexec in the verbose mode using the 
> command, mpiexec -verbose -n 1 hostname : -host IPADDRESS_OF_roobarb -n 
> 1 hostname
> 
> # Run an MPI program (cpi.exe provided with MPICH2) with mpiexec in the 
> verbose mode using the command, mpiexec -verbose -n 1 cpi.exe : -host 
> IPADDRESS_OF_roobarb -n 1 cpi.exe
> 
> # Send us the debug/verbose outputs of mpiexec and smpd.
> 
>   Let us know the results.
> 
> Regards,
> Jayesh
> 
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of James S Perrin
> Sent: Tuesday, October 07, 2008 5:25 AM
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] SMPD, Problem launching when using -host
> 
> Hi,
> 
>      No I get the same error if I use the ipaddress.
> 
> Regards
> James
> 
> 
> Jayesh Krishna wrote:
>  >  Hi,
>  >   Does it work if you specify the ipaddress of the machine instead of
>  > hostname (mpiexec -n 1 master : -host IPADDRESS_OF_roobarb -n 1 slave) ?
>  >
>  > Regards,
>  > Jayesh
>  >
>  > -----Original Message-----
>  > From: James S Perrin [mailto:james.s.perrin at manchester.ac.uk]
>  > Sent: Monday, October 06, 2008 5:18 AM
>  > To: Jayesh Krishna
>  > Cc: mpich-discuss at mcs.anl.gov
>  > Subject: Re: [mpich-discuss] SMPD, Problem launching when using -host
>  >
>  > Hi,
>  >
>  > Jayesh Krishna wrote:
>  >  >  Hi,
>  >  >
>  >  >  >> mpiexec -n 1 -host roobarb master : -n 1 slave
>  >  >         The command above("-host" option specified for only one
>  >  > executable) works for me. What is the error message that you get  >
>  > (Provide us with the snapshot of your command and the error output. It 
>  > > would also help us if you provide more details - Is roobarb a remote 
>  > > machine ? etc) ?
>  >
>  > The error is:
>  >
>  > [0] PMI_Init failed: FAIL - init called when another process has
>  > exited without calling init Fatal error in MPI_Init_thread: Other MPI
>  > error, error stack:
>  > MPIR_Init_thread(294): Initialization failed
>  > MPID_Init(82)........: channel initialization failed
>  > MPID_Init(333).......: PMI_Init returned -1unable to read the cmd
>  > header on the pmi context, generic socket failure, error stack:
>  > MPIDU_Sock_wait(2603): The specified network name is no longer
>  > available. (errno 64).
>  >
>  > job aborted:
>  > rank: node: exit code[: error message]
>  > 0: ROOBARB: 3: Fatal error in MPI_Init_thread: Other MPI error, error 
> stack:
>  > MPIR_Init_thread(294): Initialization failed
>  > MPID_Init(82)........: channel initialization failed
>  > MPID_Init(333).......: PMI_Init returned -1
>  > 1: roobarb: -1073741515
>  >
>  > The second process is not starting for some reason.
>  >
>  > roobarb happens to be the local machine in this case but the problem
>  > also occurs on a cluster.
>  >
>  > It will launch correctly if I use:
>  >
>  > mpiexec -n 1 master : -n 1 slave - SUCCESS
>  >
>  > which should be no different from:
>  >
>  > mpiexec -n 1 master : -host roobarb -n 1 slave - FAILS
>  >
>  > when everything is running on roobarb.
>  >
>  >  >  >> mpiexec -localroot -n 1 roobarb master : -host roobarb -n 1
>  > slave  >
>  >  >         When using the "-localroot" option you should not specify the
>  >  > hostname for the 1st executable. The command should be,  >  >  >>
>  > mpiexec -localroot -n 1 master : -host roobarb -n 1 slave
>  >
>  > sorry typo I meant if would work I used:
>  >
>  > mpiexec -localroot -host roobarb -n 1  master : -host roobarb -n 1
>  > slave
>  >
>  > Regards
>  > James
>  >
>  >  >
>  >  > -----Original Message-----
>  >  > From: owner-mpich-discuss at mcs.anl.gov  >
>  > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of James S Perrin 
>  > > Sent: Friday, October 03, 2008 12:13 PM  > To: mpich  > Subject:
>  > [mpich-discuss] SMPD, Problem launching when using -host  >  > Hi,
>  >  >      Processes fail to start if -host is used for only some but not
>  >  > all processes when launching. ie the machines that some processes 
>  > > launch on is left up to the smpd to allocate.
>  >  >
>  >  > eg
>  >  >
>  >  > mpiexec -n 1 -host roobarb master : -n 1 slave  >  > when
>  > -localroot is used the following fails unless -host is also  >
>  > specified for the master.
>  >  >
>  >  > mpiexc -localroot -n 1 roobarb master : -host roobarb -n 1 slave  > 
>  > > Using MPICH2 1.0.7 on WinXP ia32.
>  >  >
>  >  > Regards
>  >  > James
>  >  > --
>  >  > 
> ------------------------------------------------------------------------
>  >  >    James S. Perrin
>  >  >    Visualization
>  >  >
>  >  >    Research Computing Services
>  >  >    The University of Manchester
>  >  >    Kilburn Building, Oxford Road
>  >  >    Manchester, M13 9PL
>  >  >
>  >  >    t: +44 (0) 161 275 6945
>  >  >    e: james.perrin at manchester.ac.uk
>  >  >    w: www.manchester.ac.uk/researchcomputing
>  >  > 
> ------------------------------------------------------------------------
>  >  >   "The test of intellect is the refusal to belabour the obvious"
>  >  >   - Alfred Bester
>  >  >
>  > ----------------------------------------------------------------------
>  >  > --
>  >  >
>  >
>  > --
>  > ------------------------------------------------------------------------
>  >    James S. Perrin
>  >    Visualization
>  >
>  >    Research Computing Services
>  >    The University of Manchester
>  >    Kilburn Building, Oxford Road
>  >    Manchester, M13 9PL
>  >
>  >    t: +44 (0) 161 275 6945
>  >    e: james.perrin at manchester.ac.uk
>  >    w: www.manchester.ac.uk/researchcomputing
>  > ------------------------------------------------------------------------
>  >   "The test of intellect is the refusal to belabour the obvious"
>  >   - Alfred Bester
>  > ----------------------------------------------------------------------
>  > --
>  >
> 
> --
> ------------------------------------------------------------------------
>    James S. Perrin
>    Visualization
> 
>    Research Computing Services
>    The University of Manchester
>    Kilburn Building, Oxford Road
>    Manchester, M13 9PL
> 
>    t: +44 (0) 161 275 6945
>    e: james.perrin at manchester.ac.uk
>    w: www.manchester.ac.uk/researchcomputing
> ------------------------------------------------------------------------
>   "The test of intellect is the refusal to belabour the obvious"
>   - Alfred Bester
> ------------------------------------------------------------------------
> 

-- 
------------------------------------------------------------------------
   James S. Perrin
   Visualization

   Research Computing Services
   The University of Manchester
   Kilburn Building, Oxford Road
   Manchester, M13 9PL

   t: +44 (0) 161 275 6945
   e: james.perrin at manchester.ac.uk
   w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
  "The test of intellect is the refusal to belabour the obvious"
  - Alfred Bester
------------------------------------------------------------------------




More information about the mpich-discuss mailing list