[mpich-discuss] mpdboot error

Rajeev Thakur thakur at mcs.anl.gov
Thu Nov 6 11:29:14 CST 2008


You can select a specific network interface via the ifhn option as described
in Sec 5.1.5 of the installation guide.

Rajeev 


> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Cesar 
> Covarrubias
> Sent: Wednesday, November 05, 2008 6:32 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpdboot error
> 
> I am seeing something interesting. Below is the output from 
> the mpdcheck
> -s:
> 
> -bash-3.2# mpdcheck -s
> server listening at INADDR_ANY on: bduc-sched.nacs.uci.edu 55616
> server has conn on <socket._socketobject object at 0xb7e962fc> from
> ('128.200.15.20', 41804)
> server successfully recvd msg from client: hello_from_client_to_server
> 
> The nodes, however, are all on a private subnet, connected to a second
> network interface on the head node. Each node is in the 192.168.0.0
> range. Shouldn't that IP be the one returned instead of the 
> public ip on
> the head node?
> 
> Thanks,
> Cesar
> 
> On Wed, 2008-11-05 at 18:04 -0600, Rajeev Thakur wrote:
> > It is usually because of a problem with the networking 
> configuration on the
> > machines. To debug the problem, you can use the mpdcheck utility as
> > described in Appendix A.2 of the MPICH2 installation guide.
> > 
> > Rajeev 
> > 
> > > -----Original Message-----
> > > From: mpich-discuss-bounces at mcs.anl.gov 
> > > [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Cesar 
> > > Covarrubias
> > > Sent: Wednesday, November 05, 2008 5:54 PM
> > > To: mpich-discuss at mcs.anl.gov
> > > Subject: [mpich-discuss] mpdboot error
> > > 
> > > Hello,
> > > 
> > > We are trying to launch a new cluster that uses mpich2, 
> but I'm having
> > > trouble when I run mpdboot. I get different errors every 
> time I try to
> > > run "mpdboot -n 5 -f mpd.hosts" Here is a sample of the errors:
> > > 
> > > mpdboot_bduc-sched.nacs.uci.edu (handle_mpd_output 392): failed to
> > > handshake with mpd on bduc-i32-3; recvd output={}
> > > 
> > > mpdboot_bduc-sched.nacs.uci.edu (handle_mpd_output 401): failed to
> > > connect to mpd on bduc-i32-2
> > > 
> > > mpdboot_bduc-sched.nacs.uci.edu (handle_mpd_output 401): failed to
> > > connect to mpd on bduc-i32-1
> > > 
> > > Any thoughts?
> > > 
> > > Very Respectfully,
> > > Cesar Covarrubias
> > > UC Irvine
> > > 
> > > 
> > 
> 
> 




More information about the mpich-discuss mailing list