[mpich-discuss] SMPD, Problem launching when using -host

Jayesh Krishna jayesh at mcs.anl.gov
Mon Oct 6 09:06:33 CDT 2008


 Hi,
  Does it work if you specify the ipaddress of the machine instead of
hostname (mpiexec -n 1 master : -host IPADDRESS_OF_roobarb -n 1 slave) ?

Regards,
Jayesh

-----Original Message-----
From: James S Perrin [mailto:james.s.perrin at manchester.ac.uk] 
Sent: Monday, October 06, 2008 5:18 AM
To: Jayesh Krishna
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] SMPD, Problem launching when using -host

Hi,

Jayesh Krishna wrote:
>  Hi,
> 
>  >> mpiexec -n 1 -host roobarb master : -n 1 slave
>         The command above("-host" option specified for only one
> executable) works for me. What is the error message that you get 
> (Provide us with the snapshot of your command and the error output. It 
> would also help us if you provide more details - Is roobarb a remote 
> machine ? etc) ?

The error is:

[0] PMI_Init failed: FAIL - init called when another process has exited
without calling init Fatal error in MPI_Init_thread: Other MPI error,
error stack:
MPIR_Init_thread(294): Initialization failed
MPID_Init(82)........: channel initialization failed
MPID_Init(333).......: PMI_Init returned -1unable to read the cmd header
on the pmi context, generic socket failure, error stack:
MPIDU_Sock_wait(2603): The specified network name is no longer available.
(errno 64).

job aborted:
rank: node: exit code[: error message]
0: ROOBARB: 3: Fatal error in MPI_Init_thread: Other MPI error, error
stack:
MPIR_Init_thread(294): Initialization failed
MPID_Init(82)........: channel initialization failed
MPID_Init(333).......: PMI_Init returned -1
1: roobarb: -1073741515

The second process is not starting for some reason.

roobarb happens to be the local machine in this case but the problem also
occurs on a cluster.

It will launch correctly if I use:

mpiexec -n 1 master : -n 1 slave - SUCCESS

which should be no different from:

mpiexec -n 1 master : -host roobarb -n 1 slave - FAILS

when everything is running on roobarb.

>  >> mpiexec -localroot -n 1 roobarb master : -host roobarb -n 1 slave
> 
>         When using the "-localroot" option you should not specify the 
> hostname for the 1st executable. The command should be,
> 
>  >> mpiexec -localroot -n 1 master : -host roobarb -n 1 slave

sorry typo I meant if would work I used:

mpiexec -localroot -host roobarb -n 1  master : -host roobarb -n 1 slave

Regards
James

> 
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of James S Perrin
> Sent: Friday, October 03, 2008 12:13 PM
> To: mpich
> Subject: [mpich-discuss] SMPD, Problem launching when using -host
> 
> Hi,
>      Processes fail to start if -host is used for only some but not 
> all processes when launching. ie the machines that some processes 
> launch on is left up to the smpd to allocate.
> 
> eg
> 
> mpiexec -n 1 -host roobarb master : -n 1 slave
> 
> when -localroot is used the following fails unless -host is also 
> specified for the master.
> 
> mpiexc -localroot -n 1 roobarb master : -host roobarb -n 1 slave
> 
> Using MPICH2 1.0.7 on WinXP ia32.
> 
> Regards
> James
> --
> ------------------------------------------------------------------------
>    James S. Perrin
>    Visualization
> 
>    Research Computing Services
>    The University of Manchester
>    Kilburn Building, Oxford Road
>    Manchester, M13 9PL
> 
>    t: +44 (0) 161 275 6945
>    e: james.perrin at manchester.ac.uk
>    w: www.manchester.ac.uk/researchcomputing
> ------------------------------------------------------------------------
>   "The test of intellect is the refusal to belabour the obvious"
>   - Alfred Bester
> ----------------------------------------------------------------------
> --
> 

--
------------------------------------------------------------------------
   James S. Perrin
   Visualization

   Research Computing Services
   The University of Manchester
   Kilburn Building, Oxford Road
   Manchester, M13 9PL

   t: +44 (0) 161 275 6945
   e: james.perrin at manchester.ac.uk
   w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
  "The test of intellect is the refusal to belabour the obvious"
  - Alfred Bester
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20081006/b14aa41d/attachment.htm>


More information about the mpich-discuss mailing list