[mpich-discuss] SMPD, Problem launching when using -host
    Jayesh Krishna 
    jayesh at mcs.anl.gov
       
    Mon Oct  6 09:06:33 CDT 2008
    
    
  
 Hi,
  Does it work if you specify the ipaddress of the machine instead of
hostname (mpiexec -n 1 master : -host IPADDRESS_OF_roobarb -n 1 slave) ?
Regards,
Jayesh
-----Original Message-----
From: James S Perrin [mailto:james.s.perrin at manchester.ac.uk] 
Sent: Monday, October 06, 2008 5:18 AM
To: Jayesh Krishna
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] SMPD, Problem launching when using -host
Hi,
Jayesh Krishna wrote:
>  Hi,
> 
>  >> mpiexec -n 1 -host roobarb master : -n 1 slave
>         The command above("-host" option specified for only one
> executable) works for me. What is the error message that you get 
> (Provide us with the snapshot of your command and the error output. It 
> would also help us if you provide more details - Is roobarb a remote 
> machine ? etc) ?
The error is:
[0] PMI_Init failed: FAIL - init called when another process has exited
without calling init Fatal error in MPI_Init_thread: Other MPI error,
error stack:
MPIR_Init_thread(294): Initialization failed
MPID_Init(82)........: channel initialization failed
MPID_Init(333).......: PMI_Init returned -1unable to read the cmd header
on the pmi context, generic socket failure, error stack:
MPIDU_Sock_wait(2603): The specified network name is no longer available.
(errno 64).
job aborted:
rank: node: exit code[: error message]
0: ROOBARB: 3: Fatal error in MPI_Init_thread: Other MPI error, error
stack:
MPIR_Init_thread(294): Initialization failed
MPID_Init(82)........: channel initialization failed
MPID_Init(333).......: PMI_Init returned -1
1: roobarb: -1073741515
The second process is not starting for some reason.
roobarb happens to be the local machine in this case but the problem also
occurs on a cluster.
It will launch correctly if I use:
mpiexec -n 1 master : -n 1 slave - SUCCESS
which should be no different from:
mpiexec -n 1 master : -host roobarb -n 1 slave - FAILS
when everything is running on roobarb.
>  >> mpiexec -localroot -n 1 roobarb master : -host roobarb -n 1 slave
> 
>         When using the "-localroot" option you should not specify the 
> hostname for the 1st executable. The command should be,
> 
>  >> mpiexec -localroot -n 1 master : -host roobarb -n 1 slave
sorry typo I meant if would work I used:
mpiexec -localroot -host roobarb -n 1  master : -host roobarb -n 1 slave
Regards
James
> 
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of James S Perrin
> Sent: Friday, October 03, 2008 12:13 PM
> To: mpich
> Subject: [mpich-discuss] SMPD, Problem launching when using -host
> 
> Hi,
>      Processes fail to start if -host is used for only some but not 
> all processes when launching. ie the machines that some processes 
> launch on is left up to the smpd to allocate.
> 
> eg
> 
> mpiexec -n 1 -host roobarb master : -n 1 slave
> 
> when -localroot is used the following fails unless -host is also 
> specified for the master.
> 
> mpiexc -localroot -n 1 roobarb master : -host roobarb -n 1 slave
> 
> Using MPICH2 1.0.7 on WinXP ia32.
> 
> Regards
> James
> --
> ------------------------------------------------------------------------
>    James S. Perrin
>    Visualization
> 
>    Research Computing Services
>    The University of Manchester
>    Kilburn Building, Oxford Road
>    Manchester, M13 9PL
> 
>    t: +44 (0) 161 275 6945
>    e: james.perrin at manchester.ac.uk
>    w: www.manchester.ac.uk/researchcomputing
> ------------------------------------------------------------------------
>   "The test of intellect is the refusal to belabour the obvious"
>   - Alfred Bester
> ----------------------------------------------------------------------
> --
> 
--
------------------------------------------------------------------------
   James S. Perrin
   Visualization
   Research Computing Services
   The University of Manchester
   Kilburn Building, Oxford Road
   Manchester, M13 9PL
   t: +44 (0) 161 275 6945
   e: james.perrin at manchester.ac.uk
   w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
  "The test of intellect is the refusal to belabour the obvious"
  - Alfred Bester
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20081006/b14aa41d/attachment.htm>
    
    
More information about the mpich-discuss
mailing list