[MPICH] ssh failed to connect
Reuti
reuti at staff.uni-marburg.de
Wed Jul 25 00:10:12 CDT 2007
Hi,
what about removing the 127.0.0.2 entry from /etc/hosts and giving
server2 a sensible address therein. Maybe the started mpd on server4
tries to connect to the sender, i.e. 127.0.0.2 which will fail.
-- Reuti
Am 25.07.2007 um 02:12 schrieb Jorge Gonzalez:
>
>
> On 7/24/07, Anthony Chan <chan at mcs.anl.gov> wrote:
>
> Did you try using "mpdcheck" to check if other network settings are
> OK as
> described in MPICH2 user's guide ?
>
> hi, thanks for the answer
> sorry for the long mail :P
>
> I try launch in a "server2" machine the check, the .mpd.hosts file
> are:
> server4
> server2
>
> this is the output:
>
> administrador at server2:~> mpdcheck
> *** first ipaddr for this host (via server2) is: 127.0.0.2
>
> administrador at server2:~> mpdcheck -pc
> --- print results of: gethostbyname_ex(gethostname())
> ('server2', [], ['127.0.0.2'])
> --- try to run /bin/hostname
> server2
> --- try to run uname -a
> Linux server2 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC
> 2006 x86_64 x86_64 x86_64 GNU/Linux
> --- try to print /etc/hosts
> #
> # hosts This file describes a number of hostname-to-address
> # mappings for the TCP/IP subsystem. It is mostly
> # used at boot time, when no name servers are running.
> # On small systems, this file can be used instead of a
> # "named" name server.
> # Syntax:
> #
> # IP-Address Full-Qualified-Hostname Short-Hostname
> #
>
> 127.0.0.2 server2
> 127.0.0.1 localhost
> XXX.XXX.123.25 server4
> XXX.XXX.122.50 server1
>
> # special IPv6 addresses
> ::1 localhost ipv6-localhost ipv6-loopback
>
> fe00::0 ipv6-localnet
>
> ff00::0 ipv6-mcastprefix
> ff02::1 ipv6-allnodes
> ff02::2 ipv6-allrouters
> ff02::3 ipv6-allhosts
> 127.0.0.2 server2 server2
>
> --- try to print /etc/resolv.conf
> ### BEGIN INFO
> #
> # Modified_by: dhcpcd
> # Backup: /etc/resolv.conf.saved.by.dhcpcd.eth0
> # Process: dhcpcd
> # Process_id: 3847
> # Script: /sbin/modify_resolvconf
> # Saveto:
> # Info: This is a temporary resolv.conf created by service
> dhcpcd.
> # The previous file has been saved and will be
> restored later.
> #
> # If you don't like your resolv.conf to be changed, you
> # can set MODIFY_{RESOLV,NAMED}_CONF_DYNAMICALLY=no.
> This
> # variables are placed in /etc/sysconfig/network/config.
> #
> # You can also configure service dhcpcd not to modify
> it.
> #
> # If you don't like dhcpcd to change your nameserver
> # settings
> # then either set DHCLIENT_MODIFY_RESOLV_CONF=no
> # in /etc/sysconfig/network/dhcp, or
> # set MODIFY_RESOLV_CONF_DYNAMICALLY=no in
> # /etc/sysconfig/network/config or (manually) use dhcpcd
> # with -R. If you only want to keep your searchlist,
> set
> # DHCLIENT_KEEP_SEARCHLIST=yes in /etc/sysconfig/
> network/dhcp or
> # (manually) use the -K option.
> #
> ### END INFO
> search XXX.XXX.160.17 XXX.XXX.18 XXX.XXX.160.22 XXX.XXX.160.23
> nameserver XXX.XXX.160.17
> nameserver XXX.XXX.160.18
> nameserver XXX.XXX.160.22
> nameserver XXX.XXX.160.23
> --- try to run /sbin/ifconfig -a
> eth0 Link encap:Ethernet HWaddr 00:18:8B:1E:1F:D6
> inet addr:XXX.XXX.123.136 Bcast:XXX.XXX.123.255 Mask:
> 255.255.254.0
> inet6 addr: fe80::218:8bff:fe1e:1fd6/64 Scope:Link
> UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500
> Metric:1
> RX packets:18300 errors:3 dropped:0 overruns:0 frame:4
> TX packets:890 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:2217723 (2.1 Mb) TX bytes:134240 (131.0 Kb)
> Interrupt:169
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask: 255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:166 errors:0 dropped:0 overruns:0 frame:0
> TX packets:166 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:20311 (19.8 Kb) TX bytes:20311 (19.8 Kb)
>
> sit0 Link encap:IPv6-in-IPv4
> NOARP MTU:1480 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>
> --- try to print /etc/nsswitch.conf
> #
> # /etc/nsswitch.conf
> #
> # An example Name Service Switch config file. This file should be
> # sorted with the most-used services at the beginning.
> #
> # The entry '[NOTFOUND=return]' means that the search for an
> # entry should stop if the search in the previous entry turned
> # up nothing. Note that if the search failed due to some other reason
> # (like no NIS server responding) then the search continues with the
> # next entry.
> #
> # Legal entries are:
> #
> # compat Use compatibility setup
> # nisplus Use NIS+ (NIS version 3)
> # nis Use NIS (NIS version 2), also
> called YP
> # dns Use DNS (Domain Name Service)
> # files Use the local files
> # [NOTFOUND=return] Stop searching if not found so far
> #
> # For more information, please read the nsswitch.conf.5 manual page.
> #
>
> # passwd: files nis
> # shadow: files nis
> # group: files nis
>
> passwd: compat
> group: compat
>
> hosts: files dns
> networks: files dns
>
> services: files
> protocols: files
> rpc: files
> ethers: files
> netmasks: files
> netgroup: files nis
> publickey: files
>
> bootparams: files
> automount: files nis
> aliases: files
>
>
> administrador at server2:~> mpdcheck -f .mpd.hosts -ssh -v
> obtaining hostname via gethostname and getfqdn
> gethostname gives server2
> getfqdn gives server2
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames; make sure
> other than 127.0.0.1
> gethostbyname_ex: ('server2', [], ['127.0.0.2'])
> *** first ipaddr for this host (via server2) is: 127.0.0.2
> gethostbyname_ex: ('server2', [], ['127.0.0.2'])
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in
> hosts file
> checking gethostbyXXX for unqualified server4
> gethostbyname_ex: ('server4', [], ['XXX.XXX.123.25'])
> checking gethostbyXXX for qualified server4
> gethostbyname_ex: ('server4', [], ['XXX.XXX.123.25'])
> checking gethostbyXXX for unqualified server2
> gethostbyname_ex: ('server2', [], ['127.0.0.2'])
> checking gethostbyXXX for qualified server2
> gethostbyname_ex: ('server2', [], ['127.0.0.2'])
> trying: ssh server4 -x -n /bin/echo hello
> trying: ssh server2 -x -n /bin/echo hello
> starting server: /usr/local/bin/mpdcheck.py -s
> starting client: ssh server4 -x -n /usr/local/bin/mpdcheck.py -c
> server2 25734
> ** timed out waiting for client on server4 to produce output
> client on server4 failed to access the server
>
> after I try
> administrador at server2:~> ssh server4 -x -n /bin/echo helloJorge
>
> and the output are
> helloJorge
>
>
> administrador at server2:~> mpdboot -f .mpd.hosts -n 2
> mpdboot_server2 (handle_mpd_output 383): failed to connect to mpd
> on server4
> administrador at server2:~> mpdboot -f .mpd.hosts -n 2 -v
> running mpdallexit on server2
> LAUNCHED mpd on server2 via
> RUNNING: mpd on server2
> LAUNCHED mpd on server4 via server2
> mpdboot_server2 (handle_mpd_output 383): failed to connect to mpd
> on server4
>
>
> I dont know why failed the access :S
>
> thanks
>
> On Mon, 23 Jul 2007, Jorge Gonzalez wrote:
>
> > Hi all
> >
> > I'm configuring a cluster of Two Pc using Suse 10.2 x64,
> Mpich2-1.0.5p4,
> > OpenSSH_4.4p1
> >
> > I had configured succesfully the ssh server on each machine.
> > also I had configured the ssh clients with the command
> > ssh server1 (without password)
> > ssh server2 (without password)
> >
> > However when I tread to bring a ring of these two machines with
> the command
> > mpdbood -n 2 -f .mpd.hosts
> >
> > the following message is obtained are:
> > mpdboot_server1 (handle_mpd_output 383): failed to connect to mpd
> on server2
> >
> > can somebody tell me what I am doing wrong?
> >
> > the file .mpd.hosts contains the next two lines:
> > server1
> > server2
> >
> > I was to read this:
> > http://www-unix.mcs.anl.gov/web-mail-archive/lists/mpich-discuss/
> 2006/08/msg00009.html
> > http://www-unix.mcs.anl.gov/web-mail-archive/lists/mpich-discuss/
> 2006/04/msg00037.html
> >
> >
> > Thanks for all
> >
> > --
> > Jorge Andres Gonzalez
> > jag2kn (at) gmail.com
> > jagonalezce (at) unal.edu.co
> > Universidad Nacional de Colombia
> > Cel: 301 217 78 60
> > Linux Counter 345082
> > Bogotá - Colombia - Sur América
> >
>
>
>
> --
> JAG
> jag2kn (at) gmail.com
> Cel: 301 217 78 60
> Linux Counter 345082
> Bogotá - Colombia - Sur América
More information about the mpich-discuss
mailing list