[mpich-discuss] mpdboot fails
Rajeev Thakur
thakur at mcs.anl.gov
Wed Aug 27 14:35:56 CDT 2008
I am no networking expert, but looks like things like these need to be
fixed:
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> wlaptop:$ mpdcheck -l
>
> **********
> Your unqualified hostname resolves to 127.0.0.1, which is
> the IP address reserved for localhost. This likely means that
> you have a line similar to this one in your /etc/hosts file:
> 127.0.0.1 $uqhn
> This should perhaps be changed to the following:
> 127.0.0.1 localhost.localdomain localhost
> **********
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Robert Kubrick
> Sent: Wednesday, August 27, 2008 2:18 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpdboot fails
>
> This is what I have:
>
> iMac
> ====
>
> $ mpdcheck -v
> obtaining hostname via gethostname and getfqdn
> gethostname gives iMac.local
> getfqdn gives iMac.local
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames; make sure
> other than 127.0.0.1
> gethostbyname_ex: ('imac.local', [], ['192.168.2.1',
> '169.254.51.159'])
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> gethostbyname_ex: ('imac.local', [], ['192.168.2.1',
> '169.254.51.159'])
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts
> file
> iMac:$ mpdcheck -l
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> *** gethostbyaddr failed for this hosts's IP 192.168.2.1
> iMac:$ cat /etc/hosts
> ##
> # Host Database
> #
> # localhost is used to configure the loopback interface
> # when the system is booting. Do not change this entry.
> ##
> 127.0.0.1 localhost
> 255.255.255.255 broadcasthost
> ::1 localhost
>
> 192.168.2.1 iMac imac
> 192.168.2.2 wlaptop
>
>
> wlaptop
> ======
>
> $ mpdcheck -v
> obtaining hostname via gethostname and getfqdn
> gethostname gives wlaptop
> getfqdn gives wlaptop
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames; make sure
> other than 127.0.0.1
> gethostbyname_ex: ('wlaptop', [], ['127.0.1.1', '192.168.2.2'])
> *** first ipaddr for this host (via wlaptop) is: 127.0.1.1
> gethostbyname_ex: ('wlaptop', [], ['127.0.1.1', '192.168.2.2'])
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts
> file
> wlaptop:$ mpdcheck -l
>
> **********
> Your unqualified hostname resolves to 127.0.0.1, which is
> the IP address reserved for localhost. This likely means that
> you have a line similar to this one in your /etc/hosts file:
> 127.0.0.1 $uqhn
> This should perhaps be changed to the following:
> 127.0.0.1 localhost.localdomain localhost
> **********
>
> wlaptop:$ cat /etc/hosts
> 127.0.0.1 localhost
> 127.0.1.1 wlaptop
>
> 192.168.2.1 imac iMac
> 192.168.2.2 wlaptop
>
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
>
>
>
> On Aug 27, 2008, at 12:23 PM, Rajeev Thakur wrote:
>
> > Did you run all the mpdcheck tests as described in Appendix
> A.2 of the
> > installation guide?
> >
> > Rajeev
> >
> >> -----Original Message-----
> >> From: owner-mpich-discuss at mcs.anl.gov
> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
> Robert Kubrick
> >> Sent: Wednesday, August 27, 2008 10:10 AM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [mpich-discuss] mpdboot fails
> >>
> >> I tried mpdcheck -s/-c and it works between the 2 machines
> both ways:
> >>
> >> iMac:$ mpdcheck -s
> >> server listening at INADDR_ANY on: iMac.local 65357
> >> server has conn on <socket._socketobject object at 0x8c030> from
> >> ('192.168.2.2', 43331)
> >> server successfully recvd msg from client:
> >> hello_from_client_to_server
> >>
> >> wlaptop:$ mpdcheck -c iMac 65357
> >> client successfully recvd ack from server:
> ack_from_server_to_client
> >>
> >> I run the mpd ring manually:
> >>
> >> On iMac:
> >>
> >> iMac:$ mpd&
> >> [1] 13755
> >> iMac:$ mpdtrace -l
> >> iMac.local_65501 (192.168.2.1)
> >>
> >> On wlaptop:
> >>
> >> wlaptop:$ mpd -h iMac -p 65501&
> >>
> >> Then I run a test on the local machine:
> >>
> >> iMac:$ mpiexec -n 2 /bin/hostname
> >> iMac.local
> >> wlaptop
> >>
> >> So far so good. Now I try to run 4 hostname processes, this
> >> is when I
> >> start having problems:
> >>
> >> iMac:$ mpiexec -n 4 hostname
> >> iMac.local_mpdman_2: conn error in connect_lhs: Operation timed out
> >>
> >> The command hangs for a while then it prints the timeout message.
> >> Lets take a look at the mpd processes:
> >>
> >> $ ps -A|fgrep mpd
> >> 13755 p1 S 0:00.28 python2.3 /opt/local/mpich2/bin/mpd
> >> 13805 p1 S 0:00.01 python2.3 /opt/local/mpich2/bin/mpd
> >> 13806 p1 S 0:00.01 python2.3 /opt/local/mpich2/bin/mpd
> >> iMac:$ mpdtrace -l
> >> iMac.local_65501 (192.168.2.1)
> >> wlaptop_53313 (127.0.1.1)
> >>
> >> I have to manually kill the extra mpd on the local machine.
> >>
> >> On Aug 26, 2008, at 7:11 PM, Rajeev Thakur wrote:
> >>
> >>> -mpd just specifies the path for mpd on remote hosts (the
> >> same path
> >>> for all
> >>> remote hosts). If it still doesn't work, there may be some problem
> >>> with the
> >>> networking configuration on the two machines. To debug the
> >> problem,
> >>> you can
> >>> use the mpdcheck utility as described in the installation guide
> >>> (all steps).
> >>>
> >>> Rajeev
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: owner-mpich-discuss at mcs.anl.gov
> >>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
> >> Robert Kubrick
> >>>> Sent: Tuesday, August 26, 2008 6:05 PM
> >>>> To: mpich-discuss at mcs.anl.gov
> >>>> Subject: Re: [mpich-discuss] mpdboot fails
> >>>>
> >>>> I'm not clear on how --mpd works: I want to specify two
> >>>> different paths, one for the local machine and one for the
> >>>> remote host (or possibly a number of different remote hosts).
> >>>> Here it looks like I can only set one path for all the hosts:
> >>>>
> >>>> $ mpdboot --verbose --totalnum=2
> >>>> --mpd=/usr/local/mpich2/bin/mpd --debug
> >>>> debug: starting
> >>>> running mpdallexit on iMac.local
> >>>> LAUNCHED mpd on iMac.local via
> >>>> debug: launch cmd= /usr/local/mpich2/bin/mpd --ncpus=1 -e -d
> >>>> debug: mpd on iMac.local on port no_port mpdboot_iMac.local
> >>>> (handle_mpd_output 406): from mpd on iMac.local, invalid
> port info:
> >>>> no_port
> >>>>
> >>>>
> >>>> $ mpdboot --verbose --totalnum=2 --debug
> >>>> debug: starting
> >>>> running mpdallexit on iMac.local
> >>>> LAUNCHED mpd on iMac.local via
> >>>> debug: launch cmd= /opt/local/mpich2/bin/mpd.py --ncpus=1 -e -d
> >>>> debug: mpd on iMac.local on port 62629
> >>>> RUNNING: mpd on iMac.local
> >>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 62629,
> >>>> 'entry_port': '', 'host': 'iMac.local', 'entry_host': '',
> >>>> 'ifhn': ''} LAUNCHED mpd on wlaptop via iMac.local
> >>>> debug: launch cmd= ssh -x -n -q wlaptop
> >>>> '/opt/local/mpich2/bin/ mpd.py -h iMac.local -p 62629
> >>>> --ncpus=1 -e -d'
> >>>> debug: mpd on wlaptop on port no_port
> >>>> mpdboot_iMac.local (handle_mpd_output 406): from mpd on
> >>>> wlaptop, invalid port info:
> >>>> no_port
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Aug 25, 2008, at 2:20 PM, Rajeev Thakur wrote:
> >>>>
> >>>>>> The path to mpd on 'wlaptop' is actually a different
> one than on
> >>>>>> iMac.local. How can I specify a different remote path for mpd?
> >>>>>
> >>>>> With the --mpd option to mpdboot.
> >>>>>
> >>>>> Rajeev
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: owner-mpich-discuss at mcs.anl.gov
> >>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
> >>>> Robert Kubrick
> >>>>>> Sent: Sunday, August 24, 2008 5:19 PM
> >>>>>> To: mpich-discuss at mcs.anl.gov
> >>>>>> Subject: [mpich-discuss] mpdboot fails
> >>>>>>
> >>>>>> mpdboot fails when I try to run with this command:
> >>>>>>
> >>>>>> # mpdboot --totalnum=2 --verbose -d
> debug: starting
> >>>>>> running mpdallexit on iMac.local
> >>>>>> LAUNCHED mpd on iMac.local via
> >>>>>> debug: launch cmd= /opt/local/mpich2/bin/mpd.py
> --ncpus=1 -e -d
> >>>>>> debug: mpd on iMac.local on port 61785
> >>>>>> RUNNING: mpd on iMac.local
> >>>>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 61785,
> >>>>>> 'entry_port': '', 'host': 'iMac.local', 'entry_host': '',
> >>>> 'ifhn': ''}
> >>>>>> LAUNCHED mpd on wlaptop via iMac.local
> >>>>>> debug: launch cmd= ssh -x -n -q wlaptop '/opt/local/mpich2/bin/
> >>>>>> mpd.py -h iMac.local -p 61785 --ncpus=1 -e -d'
> >>>>>> debug: mpd on wlaptop on port no_port mpdboot_iMac.local
> >>>>>> (handle_mpd_output 406): from mpd on wlaptop, invalid
> port info:
> >>>>>> no_port
> >>>>>>
> >>>>>>
> >>>>>> The path to mpd on 'wlaptop' is actually a different
> one than on
> >>>>>> iMac.local. How can I specify a different remote path for mpd?
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>
More information about the mpich-discuss
mailing list