[mpich-discuss] mpdboot fails

Rajeev Thakur thakur at mcs.anl.gov
Wed Aug 27 14:35:56 CDT 2008


I am no networking expert, but looks like things like these need to be
fixed:

> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1 

> wlaptop:$ mpdcheck -l
> 
>      **********
>      Your unqualified hostname resolves to 127.0.0.1, which is
>      the IP address reserved for localhost. This likely means that
>      you have a line similar to this one in your /etc/hosts file:
>      127.0.0.1   $uqhn
>      This should perhaps be changed to the following:
>      127.0.0.1   localhost.localdomain localhost
>      **********

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Robert Kubrick
> Sent: Wednesday, August 27, 2008 2:18 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpdboot fails
> 
> This is what I have:
> 
> iMac
> ====
> 
> $ mpdcheck -v
> obtaining hostname via gethostname and getfqdn
> gethostname gives  iMac.local
> getfqdn gives  iMac.local
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames;  make sure  
> other than 127.0.0.1
> gethostbyname_ex:  ('imac.local', [], ['192.168.2.1', 
> '169.254.51.159'])
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> gethostbyname_ex:  ('imac.local', [], ['192.168.2.1', 
> '169.254.51.159'])
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts  
> file
> iMac:$ mpdcheck -l
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> iMac:$ cat /etc/hosts
> ##
> # Host Database
> #
> # localhost is used to configure the loopback interface
> # when the system is booting.  Do not change this entry.
> ##
> 127.0.0.1       localhost
> 255.255.255.255 broadcasthost
> ::1             localhost
> 
> 192.168.2.1     iMac imac
> 192.168.2.2     wlaptop
> 
> 
> wlaptop
> ======
> 
> $ mpdcheck -v
> obtaining hostname via gethostname and getfqdn
> gethostname gives  wlaptop
> getfqdn gives  wlaptop
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames;  make sure  
> other than 127.0.0.1
> gethostbyname_ex:  ('wlaptop', [], ['127.0.1.1', '192.168.2.2'])
> *** first ipaddr for this host (via wlaptop) is: 127.0.1.1
> gethostbyname_ex:  ('wlaptop', [], ['127.0.1.1', '192.168.2.2'])
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts  
> file
> wlaptop:$ mpdcheck -l
> 
>      **********
>      Your unqualified hostname resolves to 127.0.0.1, which is
>      the IP address reserved for localhost. This likely means that
>      you have a line similar to this one in your /etc/hosts file:
>      127.0.0.1   $uqhn
>      This should perhaps be changed to the following:
>      127.0.0.1   localhost.localdomain localhost
>      **********
> 
> wlaptop:$ cat /etc/hosts
> 127.0.0.1       localhost
> 127.0.1.1       wlaptop
> 
> 192.168.2.1     imac iMac
> 192.168.2.2     wlaptop
> 
> # The following lines are desirable for IPv6 capable hosts
> ::1     ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> 
> 
> 
> On Aug 27, 2008, at 12:23 PM, Rajeev Thakur wrote:
> 
> > Did you run all the mpdcheck tests as described in Appendix 
> A.2 of the
> > installation guide?
> >
> > Rajeev
> >
> >> -----Original Message-----
> >> From: owner-mpich-discuss at mcs.anl.gov
> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of 
> Robert Kubrick
> >> Sent: Wednesday, August 27, 2008 10:10 AM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [mpich-discuss] mpdboot fails
> >>
> >> I tried mpdcheck -s/-c and it works between the 2 machines 
> both ways:
> >>
> >> iMac:$ mpdcheck -s
> >> server listening at INADDR_ANY on: iMac.local 65357
> >> server has conn on <socket._socketobject object at 0x8c030> from
> >> ('192.168.2.2', 43331)
> >> server successfully recvd msg from client:  
> >> hello_from_client_to_server
> >>
> >> wlaptop:$ mpdcheck -c iMac 65357
> >> client successfully recvd ack from server: 
> ack_from_server_to_client
> >>
> >> I run the mpd ring manually:
> >>
> >> On iMac:
> >>
> >> iMac:$ mpd&
> >> [1] 13755
> >> iMac:$ mpdtrace -l
> >> iMac.local_65501 (192.168.2.1)
> >>
> >> On wlaptop:
> >>
> >> wlaptop:$ mpd -h iMac -p 65501&
> >>
> >> Then I run a test on the local machine:
> >>
> >> iMac:$ mpiexec -n 2 /bin/hostname
> >> iMac.local
> >> wlaptop
> >>
> >> So far so good. Now I try to run 4 hostname processes, this
> >> is when I
> >> start having problems:
> >>
> >> iMac:$ mpiexec -n 4 hostname
> >> iMac.local_mpdman_2: conn error in connect_lhs: Operation timed out
> >>
> >> The command hangs for a while then it prints the timeout message.
> >> Lets take a look at the mpd processes:
> >>
> >> $ ps -A|fgrep mpd
> >> 13755  p1  S      0:00.28 python2.3 /opt/local/mpich2/bin/mpd
> >> 13805  p1  S      0:00.01 python2.3 /opt/local/mpich2/bin/mpd
> >> 13806  p1  S      0:00.01 python2.3 /opt/local/mpich2/bin/mpd
> >> iMac:$ mpdtrace -l
> >> iMac.local_65501 (192.168.2.1)
> >> wlaptop_53313 (127.0.1.1)
> >>
> >> I have to manually kill the extra mpd on the local machine.
> >>
> >> On Aug 26, 2008, at 7:11 PM, Rajeev Thakur wrote:
> >>
> >>> -mpd just specifies the path for mpd on remote hosts (the
> >> same path
> >>> for all
> >>> remote hosts). If it still doesn't work, there may be some problem
> >>> with the
> >>> networking configuration on the two machines. To debug the
> >> problem,
> >>> you can
> >>> use the mpdcheck utility as described in the installation guide
> >>> (all steps).
> >>>
> >>> Rajeev
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: owner-mpich-discuss at mcs.anl.gov
> >>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
> >> Robert Kubrick
> >>>> Sent: Tuesday, August 26, 2008 6:05 PM
> >>>> To: mpich-discuss at mcs.anl.gov
> >>>> Subject: Re: [mpich-discuss] mpdboot fails
> >>>>
> >>>> I'm not clear on how --mpd works: I want to specify two
> >>>> different paths, one for the local machine and one for the
> >>>> remote host (or possibly a number of different remote hosts).
> >>>> Here it looks like I can only set one path for all the hosts:
> >>>>
> >>>> $ mpdboot --verbose --totalnum=2
> >>>> --mpd=/usr/local/mpich2/bin/mpd --debug
> >>>> debug: starting
> >>>> running mpdallexit on iMac.local
> >>>> LAUNCHED mpd on iMac.local  via
> >>>> debug: launch cmd= /usr/local/mpich2/bin/mpd   --ncpus=1 -e -d
> >>>> debug: mpd on iMac.local  on port no_port mpdboot_iMac.local
> >>>> (handle_mpd_output 406): from mpd on iMac.local, invalid 
> port info:
> >>>> no_port
> >>>>
> >>>>
> >>>> $ mpdboot --verbose --totalnum=2 --debug
> >>>> debug: starting
> >>>> running mpdallexit on iMac.local
> >>>> LAUNCHED mpd on iMac.local  via
> >>>> debug: launch cmd= /opt/local/mpich2/bin/mpd.py   --ncpus=1 -e -d
> >>>> debug: mpd on iMac.local  on port 62629
> >>>> RUNNING: mpd on iMac.local
> >>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 62629,
> >>>> 'entry_port': '', 'host': 'iMac.local', 'entry_host': '',
> >>>> 'ifhn': ''} LAUNCHED mpd on wlaptop  via  iMac.local
> >>>> debug: launch cmd= ssh -x -n -q wlaptop
> >>>> '/opt/local/mpich2/bin/ mpd.py  -h iMac.local -p 62629
> >>>> --ncpus=1 -e -d'
> >>>> debug: mpd on wlaptop  on port no_port
> >>>> mpdboot_iMac.local (handle_mpd_output 406): from mpd on
> >>>> wlaptop, invalid port info:
> >>>> no_port
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Aug 25, 2008, at 2:20 PM, Rajeev Thakur wrote:
> >>>>
> >>>>>> The path to mpd on 'wlaptop' is actually a different 
> one than on
> >>>>>> iMac.local. How can I specify a different remote path for mpd?
> >>>>>
> >>>>> With the --mpd option to mpdboot.
> >>>>>
> >>>>> Rajeev
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: owner-mpich-discuss at mcs.anl.gov
> >>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
> >>>> Robert Kubrick
> >>>>>> Sent: Sunday, August 24, 2008 5:19 PM
> >>>>>> To: mpich-discuss at mcs.anl.gov
> >>>>>> Subject: [mpich-discuss] mpdboot fails
> >>>>>>
> >>>>>> mpdboot fails when I try to run with this command:
> >>>>>>
> >>>>>> # mpdboot --totalnum=2 --verbose -d               
> debug: starting
> >>>>>> running mpdallexit on iMac.local
> >>>>>> LAUNCHED mpd on iMac.local  via
> >>>>>> debug: launch cmd= /opt/local/mpich2/bin/mpd.py   
> --ncpus=1 -e -d
> >>>>>> debug: mpd on iMac.local  on port 61785
> >>>>>> RUNNING: mpd on iMac.local
> >>>>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 61785,
> >>>>>> 'entry_port': '', 'host': 'iMac.local', 'entry_host': '',
> >>>> 'ifhn': ''}
> >>>>>> LAUNCHED mpd on wlaptop  via  iMac.local
> >>>>>> debug: launch cmd= ssh -x -n -q wlaptop '/opt/local/mpich2/bin/
> >>>>>> mpd.py  -h iMac.local -p 61785  --ncpus=1 -e -d'
> >>>>>> debug: mpd on wlaptop  on port no_port mpdboot_iMac.local
> >>>>>> (handle_mpd_output 406): from mpd on wlaptop, invalid 
> port info:
> >>>>>> no_port
> >>>>>>
> >>>>>>
> >>>>>> The path to mpd on 'wlaptop' is actually a different 
> one than on
> >>>>>> iMac.local. How can I specify a different remote path for mpd?
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
> 
> 




More information about the mpich-discuss mailing list