[mpich-discuss] SMPD, Problem launching when using -host
Jayesh Krishna
jayesh at mcs.anl.gov
Mon Oct 6 09:06:33 CDT 2008
Hi,
Does it work if you specify the ipaddress of the machine instead of
hostname (mpiexec -n 1 master : -host IPADDRESS_OF_roobarb -n 1 slave) ?
Regards,
Jayesh
-----Original Message-----
From: James S Perrin [mailto:james.s.perrin at manchester.ac.uk]
Sent: Monday, October 06, 2008 5:18 AM
To: Jayesh Krishna
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] SMPD, Problem launching when using -host
Hi,
Jayesh Krishna wrote:
> Hi,
>
> >> mpiexec -n 1 -host roobarb master : -n 1 slave
> The command above("-host" option specified for only one
> executable) works for me. What is the error message that you get
> (Provide us with the snapshot of your command and the error output. It
> would also help us if you provide more details - Is roobarb a remote
> machine ? etc) ?
The error is:
[0] PMI_Init failed: FAIL - init called when another process has exited
without calling init Fatal error in MPI_Init_thread: Other MPI error,
error stack:
MPIR_Init_thread(294): Initialization failed
MPID_Init(82)........: channel initialization failed
MPID_Init(333).......: PMI_Init returned -1unable to read the cmd header
on the pmi context, generic socket failure, error stack:
MPIDU_Sock_wait(2603): The specified network name is no longer available.
(errno 64).
job aborted:
rank: node: exit code[: error message]
0: ROOBARB: 3: Fatal error in MPI_Init_thread: Other MPI error, error
stack:
MPIR_Init_thread(294): Initialization failed
MPID_Init(82)........: channel initialization failed
MPID_Init(333).......: PMI_Init returned -1
1: roobarb: -1073741515
The second process is not starting for some reason.
roobarb happens to be the local machine in this case but the problem also
occurs on a cluster.
It will launch correctly if I use:
mpiexec -n 1 master : -n 1 slave - SUCCESS
which should be no different from:
mpiexec -n 1 master : -host roobarb -n 1 slave - FAILS
when everything is running on roobarb.
> >> mpiexec -localroot -n 1 roobarb master : -host roobarb -n 1 slave
>
> When using the "-localroot" option you should not specify the
> hostname for the 1st executable. The command should be,
>
> >> mpiexec -localroot -n 1 master : -host roobarb -n 1 slave
sorry typo I meant if would work I used:
mpiexec -localroot -host roobarb -n 1 master : -host roobarb -n 1 slave
Regards
James
>
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of James S Perrin
> Sent: Friday, October 03, 2008 12:13 PM
> To: mpich
> Subject: [mpich-discuss] SMPD, Problem launching when using -host
>
> Hi,
> Processes fail to start if -host is used for only some but not
> all processes when launching. ie the machines that some processes
> launch on is left up to the smpd to allocate.
>
> eg
>
> mpiexec -n 1 -host roobarb master : -n 1 slave
>
> when -localroot is used the following fails unless -host is also
> specified for the master.
>
> mpiexc -localroot -n 1 roobarb master : -host roobarb -n 1 slave
>
> Using MPICH2 1.0.7 on WinXP ia32.
>
> Regards
> James
> --
> ------------------------------------------------------------------------
> James S. Perrin
> Visualization
>
> Research Computing Services
> The University of Manchester
> Kilburn Building, Oxford Road
> Manchester, M13 9PL
>
> t: +44 (0) 161 275 6945
> e: james.perrin at manchester.ac.uk
> w: www.manchester.ac.uk/researchcomputing
> ------------------------------------------------------------------------
> "The test of intellect is the refusal to belabour the obvious"
> - Alfred Bester
> ----------------------------------------------------------------------
> --
>
--
------------------------------------------------------------------------
James S. Perrin
Visualization
Research Computing Services
The University of Manchester
Kilburn Building, Oxford Road
Manchester, M13 9PL
t: +44 (0) 161 275 6945
e: james.perrin at manchester.ac.uk
w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
"The test of intellect is the refusal to belabour the obvious"
- Alfred Bester
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20081006/b14aa41d/attachment.htm>
More information about the mpich-discuss
mailing list