[MPICH] ssh failed to connect

Reuti reuti at staff.uni-marburg.de
Wed Jul 25 00:10:12 CDT 2007


Hi,

what about removing the 127.0.0.2 entry from /etc/hosts and giving  
server2 a sensible address therein. Maybe the started mpd on server4  
tries to connect to the sender, i.e. 127.0.0.2 which will fail.

-- Reuti

Am 25.07.2007 um 02:12 schrieb Jorge Gonzalez:

>
>
> On 7/24/07, Anthony Chan <chan at mcs.anl.gov> wrote:
>
> Did you try using "mpdcheck" to check if other network settings are  
> OK as
> described in MPICH2 user's guide ?
>
> hi, thanks for the answer
> sorry for the long mail :P
>
> I try launch in a "server2" machine the check, the .mpd.hosts file  
> are:
>   server4
>   server2
>
> this is the output:
>
> administrador at server2:~> mpdcheck
> *** first ipaddr for this host (via server2) is: 127.0.0.2
>
> administrador at server2:~> mpdcheck -pc
> --- print results of: gethostbyname_ex(gethostname())
> ('server2', [], ['127.0.0.2'])
> --- try to run /bin/hostname
> server2
> --- try to run uname -a
> Linux server2 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC  
> 2006 x86_64 x86_64 x86_64 GNU/Linux
> --- try to print /etc/hosts
> #
> # hosts         This file describes a number of hostname-to-address
> #               mappings for the TCP/IP subsystem.  It is mostly
> #               used at boot time, when no name servers are running.
> #               On small systems, this file can be used instead of a
> #               "named" name server.
> # Syntax:
> #
> # IP-Address  Full-Qualified-Hostname  Short-Hostname
> #
>
> 127.0.0.2       server2
> 127.0.0.1       localhost
> XXX.XXX.123.25  server4
> XXX.XXX.122.50  server1
>
> # special IPv6 addresses
> ::1             localhost ipv6-localhost ipv6-loopback
>
> fe00::0         ipv6-localnet
>
> ff00::0         ipv6-mcastprefix
> ff02::1         ipv6-allnodes
> ff02::2         ipv6-allrouters
> ff02::3         ipv6-allhosts
> 127.0.0.2        server2 server2
>
> --- try to print /etc/resolv.conf
> ### BEGIN INFO
> #
> # Modified_by:  dhcpcd
> # Backup:       /etc/resolv.conf.saved.by.dhcpcd.eth0
> # Process:      dhcpcd
> # Process_id:   3847
> # Script:       /sbin/modify_resolvconf
> # Saveto:
> # Info:         This is a temporary resolv.conf created by service  
> dhcpcd.
> #               The previous file has been saved and will be  
> restored later.
> #
> #               If you don't like your resolv.conf to be changed, you
> #               can set MODIFY_{RESOLV,NAMED}_CONF_DYNAMICALLY=no.  
> This
> #               variables are placed in /etc/sysconfig/network/config.
> #
> #               You can also configure service dhcpcd not to modify  
> it.
> #
> #               If you don't like dhcpcd to change your nameserver
> #               settings
> #               then either set DHCLIENT_MODIFY_RESOLV_CONF=no
> #               in /etc/sysconfig/network/dhcp, or
> #               set MODIFY_RESOLV_CONF_DYNAMICALLY=no in
> #               /etc/sysconfig/network/config or (manually) use dhcpcd
> #               with -R.  If you only want to keep your searchlist,  
> set
> #               DHCLIENT_KEEP_SEARCHLIST=yes in /etc/sysconfig/ 
> network/dhcp or
> #               (manually) use the -K option.
> #
> ### END INFO
> search XXX.XXX.160.17 XXX.XXX.18 XXX.XXX.160.22 XXX.XXX.160.23
> nameserver XXX.XXX.160.17
> nameserver XXX.XXX.160.18
> nameserver XXX.XXX.160.22
> nameserver XXX.XXX.160.23
> --- try to run /sbin/ifconfig -a
> eth0      Link encap:Ethernet  HWaddr 00:18:8B:1E:1F:D6
>           inet addr:XXX.XXX.123.136  Bcast:XXX.XXX.123.255  Mask: 
> 255.255.254.0
>           inet6 addr: fe80::218:8bff:fe1e:1fd6/64 Scope:Link
>           UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500   
> Metric:1
>           RX packets:18300 errors:3 dropped:0 overruns:0 frame:4
>           TX packets:890 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:2217723 (2.1 Mb)  TX bytes:134240 (131.0 Kb)
>           Interrupt:169
>
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask: 255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:166 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:166 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:20311 (19.8 Kb)  TX bytes:20311 (19.8 Kb)
>
> sit0      Link encap:IPv6-in-IPv4
>           NOARP  MTU:1480  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
> --- try to print /etc/nsswitch.conf
> #
> # /etc/nsswitch.conf
> #
> # An example Name Service Switch config file. This file should be
> # sorted with the most-used services at the beginning.
> #
> # The entry '[NOTFOUND=return]' means that the search for an
> # entry should stop if the search in the previous entry turned
> # up nothing. Note that if the search failed due to some other reason
> # (like no NIS server responding) then the search continues with the
> # next entry.
> #
> # Legal entries are:
> #
> #       compat                  Use compatibility setup
> #       nisplus                 Use NIS+ (NIS version 3)
> #       nis                     Use NIS (NIS version 2), also  
> called YP
> #       dns                     Use DNS (Domain Name Service)
> #       files                   Use the local files
> #       [NOTFOUND=return]       Stop searching if not found so far
> #
> # For more information, please read the nsswitch.conf.5 manual page.
> #
>
> # passwd: files nis
> # shadow: files nis
> # group:  files nis
>
> passwd: compat
> group:  compat
>
> hosts:          files dns
> networks:       files dns
>
> services:       files
> protocols:      files
> rpc:            files
> ethers:         files
> netmasks:       files
> netgroup:       files nis
> publickey:      files
>
> bootparams:     files
> automount:      files nis
> aliases:        files
>
>
> administrador at server2:~> mpdcheck -f .mpd.hosts -ssh -v
> obtaining hostname via gethostname and getfqdn
> gethostname gives  server2
> getfqdn gives  server2
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames;  make sure  
> other than 127.0.0.1
> gethostbyname_ex:  ('server2', [], ['127.0.0.2'])
> *** first ipaddr for this host (via server2) is: 127.0.0.2
> gethostbyname_ex:  ('server2', [], ['127.0.0.2'])
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in  
> hosts file
> checking gethostbyXXX for unqualified server4
> gethostbyname_ex:  ('server4', [], ['XXX.XXX.123.25'])
> checking gethostbyXXX for qualified server4
> gethostbyname_ex:  ('server4', [], ['XXX.XXX.123.25'])
> checking gethostbyXXX for unqualified server2
> gethostbyname_ex:  ('server2', [], ['127.0.0.2'])
> checking gethostbyXXX for qualified server2
> gethostbyname_ex:  ('server2', [], ['127.0.0.2'])
> trying: ssh server4 -x -n /bin/echo hello
> trying: ssh server2 -x -n /bin/echo hello
> starting server: /usr/local/bin/mpdcheck.py -s
> starting client: ssh server4 -x -n /usr/local/bin/mpdcheck.py -c  
> server2 25734
> ** timed out waiting for client on server4 to produce output
> client on server4 failed to access the server
>
> after I try
>    administrador at server2:~> ssh server4 -x -n /bin/echo helloJorge
>
> and the output are
>    helloJorge
>
>
> administrador at server2:~> mpdboot -f .mpd.hosts -n 2
> mpdboot_server2 (handle_mpd_output 383): failed to connect to mpd  
> on server4
> administrador at server2:~> mpdboot -f .mpd.hosts -n 2 -v
> running mpdallexit on server2
> LAUNCHED mpd on server2  via
> RUNNING: mpd on server2
> LAUNCHED mpd on server4  via  server2
> mpdboot_server2 (handle_mpd_output 383): failed to connect to mpd  
> on server4
>
>
> I dont know why failed the access :S
>
> thanks
>
> On Mon, 23 Jul 2007, Jorge Gonzalez wrote:
>
> > Hi all
> >
> > I'm configuring a cluster of Two Pc using Suse 10.2 x64,  
> Mpich2-1.0.5p4,
> > OpenSSH_4.4p1
> >
> > I had configured succesfully  the ssh server on each machine.
> > also I had configured the ssh clients with the command
> > ssh server1  (without password)
> > ssh server2  (without password)
> >
> > However when I tread to bring a ring of these two machines with  
> the command
> > mpdbood -n 2 -f .mpd.hosts
> >
> > the following message is obtained are:
> > mpdboot_server1 (handle_mpd_output 383): failed to connect to mpd  
> on server2
> >
> > can somebody tell me what I am doing wrong?
> >
> > the file .mpd.hosts contains the next two lines:
> > server1
> > server2
> >
> > I was to read this:
> > http://www-unix.mcs.anl.gov/web-mail-archive/lists/mpich-discuss/ 
> 2006/08/msg00009.html
> > http://www-unix.mcs.anl.gov/web-mail-archive/lists/mpich-discuss/ 
> 2006/04/msg00037.html
> >
> >
> > Thanks for all
> >
> > --
> > Jorge Andres Gonzalez
> > jag2kn (at) gmail.com
> > jagonalezce (at) unal.edu.co
> > Universidad Nacional de Colombia
> > Cel: 301 217 78 60
> > Linux Counter 345082
> > Bogotá     -    Colombia    -     Sur América
> >
>
>
>
> -- 
> JAG
> jag2kn (at) gmail.com
> Cel: 301 217 78 60
> Linux Counter 345082
> Bogotá     -    Colombia    -     Sur América




More information about the mpich-discuss mailing list