[mpich-discuss] SMPD, Problem launching when using -host
Jayesh Krishna
jayesh at mcs.anl.gov
Tue Oct 7 09:45:16 CDT 2008
Hi,
Can you send us the debug output of mpiexec and smpd ? Please follow the
instructions below to send us the debug output,
# Stop any instances of smpd using the command, smpd -stop
# Start smpd in the debug mode using the command, smpd -d
# Run a non-MPI program with mpiexec in the verbose mode using the
command, mpiexec -verbose -n 1 hostname : -host IPADDRESS_OF_roobarb -n 1
hostname
# Run an MPI program (cpi.exe provided with MPICH2) with mpiexec in the
verbose mode using the command, mpiexec -verbose -n 1 cpi.exe : -host
IPADDRESS_OF_roobarb -n 1 cpi.exe
# Send us the debug/verbose outputs of mpiexec and smpd.
Let us know the results.
Regards,
Jayesh
-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of James S Perrin
Sent: Tuesday, October 07, 2008 5:25 AM
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] SMPD, Problem launching when using -host
Hi,
No I get the same error if I use the ipaddress.
Regards
James
Jayesh Krishna wrote:
> Hi,
> Does it work if you specify the ipaddress of the machine instead of
> hostname (mpiexec -n 1 master : -host IPADDRESS_OF_roobarb -n 1 slave) ?
>
> Regards,
> Jayesh
>
> -----Original Message-----
> From: James S Perrin [mailto:james.s.perrin at manchester.ac.uk]
> Sent: Monday, October 06, 2008 5:18 AM
> To: Jayesh Krishna
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] SMPD, Problem launching when using -host
>
> Hi,
>
> Jayesh Krishna wrote:
> > Hi,
> >
> > >> mpiexec -n 1 -host roobarb master : -n 1 slave
> > The command above("-host" option specified for only one
> > executable) works for me. What is the error message that you get >
> (Provide us with the snapshot of your command and the error output. It
> > would also help us if you provide more details - Is roobarb a remote
> > machine ? etc) ?
>
> The error is:
>
> [0] PMI_Init failed: FAIL - init called when another process has
> exited without calling init Fatal error in MPI_Init_thread: Other MPI
> error, error stack:
> MPIR_Init_thread(294): Initialization failed
> MPID_Init(82)........: channel initialization failed
> MPID_Init(333).......: PMI_Init returned -1unable to read the cmd
> header on the pmi context, generic socket failure, error stack:
> MPIDU_Sock_wait(2603): The specified network name is no longer
> available. (errno 64).
>
> job aborted:
> rank: node: exit code[: error message]
> 0: ROOBARB: 3: Fatal error in MPI_Init_thread: Other MPI error, error
stack:
> MPIR_Init_thread(294): Initialization failed
> MPID_Init(82)........: channel initialization failed
> MPID_Init(333).......: PMI_Init returned -1
> 1: roobarb: -1073741515
>
> The second process is not starting for some reason.
>
> roobarb happens to be the local machine in this case but the problem
> also occurs on a cluster.
>
> It will launch correctly if I use:
>
> mpiexec -n 1 master : -n 1 slave - SUCCESS
>
> which should be no different from:
>
> mpiexec -n 1 master : -host roobarb -n 1 slave - FAILS
>
> when everything is running on roobarb.
>
> > >> mpiexec -localroot -n 1 roobarb master : -host roobarb -n 1
> slave >
> > When using the "-localroot" option you should not specify the
> > hostname for the 1st executable. The command should be, > > >>
> mpiexec -localroot -n 1 master : -host roobarb -n 1 slave
>
> sorry typo I meant if would work I used:
>
> mpiexec -localroot -host roobarb -n 1 master : -host roobarb -n 1
> slave
>
> Regards
> James
>
> >
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov >
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of James S Perrin
> > Sent: Friday, October 03, 2008 12:13 PM > To: mpich > Subject:
> [mpich-discuss] SMPD, Problem launching when using -host > > Hi,
> > Processes fail to start if -host is used for only some but not
> > all processes when launching. ie the machines that some processes
> > launch on is left up to the smpd to allocate.
> >
> > eg
> >
> > mpiexec -n 1 -host roobarb master : -n 1 slave > > when
> -localroot is used the following fails unless -host is also >
> specified for the master.
> >
> > mpiexc -localroot -n 1 roobarb master : -host roobarb -n 1 slave >
> > Using MPICH2 1.0.7 on WinXP ia32.
> >
> > Regards
> > James
> > --
> >
------------------------------------------------------------------------
> > James S. Perrin
> > Visualization
> >
> > Research Computing Services
> > The University of Manchester
> > Kilburn Building, Oxford Road
> > Manchester, M13 9PL
> >
> > t: +44 (0) 161 275 6945
> > e: james.perrin at manchester.ac.uk
> > w: www.manchester.ac.uk/researchcomputing
> >
------------------------------------------------------------------------
> > "The test of intellect is the refusal to belabour the obvious"
> > - Alfred Bester
> >
> ----------------------------------------------------------------------
> > --
> >
>
> --
> ------------------------------------------------------------------------
> James S. Perrin
> Visualization
>
> Research Computing Services
> The University of Manchester
> Kilburn Building, Oxford Road
> Manchester, M13 9PL
>
> t: +44 (0) 161 275 6945
> e: james.perrin at manchester.ac.uk
> w: www.manchester.ac.uk/researchcomputing
> ------------------------------------------------------------------------
> "The test of intellect is the refusal to belabour the obvious"
> - Alfred Bester
> ----------------------------------------------------------------------
> --
>
--
------------------------------------------------------------------------
James S. Perrin
Visualization
Research Computing Services
The University of Manchester
Kilburn Building, Oxford Road
Manchester, M13 9PL
t: +44 (0) 161 275 6945
e: james.perrin at manchester.ac.uk
w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
"The test of intellect is the refusal to belabour the obvious"
- Alfred Bester
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20081007/7b80944f/attachment.htm>
More information about the mpich-discuss
mailing list