<br><br><div><span class="gmail_quote">On 7/24/07, <b class="gmail_sendername">Anthony Chan</b> <<a href="mailto:chan@mcs.anl.gov">chan@mcs.anl.gov</a>> wrote:</span> <br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Did you try using "mpdcheck" to check if other network settings are OK as<br>described in MPICH2 user's guide ?</blockquote><div><br>hi, thanks for the answer<br>sorry for the long mail :P<br><br>I try launch in a "server2" machine the check, the .mpd.hosts file are:
<br> server4<br> server2<br><br>this is the output:<br><br>administrador@server2:~> mpdcheck <br>*** first ipaddr for this host (via server2) is: <a href="http://127.0.0.2">127.0.0.2</a><br><br>administrador@server2:~> mpdcheck -pc
<br>--- print results of: gethostbyname_ex(gethostname())<br>('server2', [], ['<a href="http://127.0.0.2">127.0.0.2</a>'])<br>--- try to run /bin/hostname<br>server2<br>--- try to run uname -a<br>Linux server2
2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux<br>--- try to print /etc/hosts<br>#<br># hosts This file describes a number of hostname-to-address<br># mappings for the TCP/IP subsystem. It is mostly
<br># used at boot time, when no name servers are running.<br># On small systems, this file can be used instead of a<br># "named" name server.<br># Syntax:<br># <br># IP-Address Full-Qualified-Hostname Short-Hostname
<br>#<br><br><a href="http://127.0.0.2">127.0.0.2</a> server2<br><a href="http://127.0.0.1">127.0.0.1</a> localhost<br>XXX.XXX.123.25 server4<br>XXX.XXX.122.50 server1<br><br># special IPv6 addresses<br>::1 localhost ipv6-localhost ipv6-loopback
<br><br>fe00::0 ipv6-localnet<br><br>ff00::0 ipv6-mcastprefix<br>ff02::1 ipv6-allnodes<br>ff02::2 ipv6-allrouters<br>ff02::3 ipv6-allhosts<br><a href="http://127.0.0.2">127.0.0.2</a>
server2 server2<br><br>--- try to print /etc/resolv.conf<br>### BEGIN INFO<br>#<br># Modified_by: dhcpcd<br># Backup: /etc/resolv.conf.saved.by.dhcpcd.eth0<br># Process: dhcpcd<br># Process_id: 3847<br>
# Script: /sbin/modify_resolvconf<br># Saveto: <br># Info: This is a temporary resolv.conf created by service dhcpcd.<br># The previous file has been saved and will be restored later.<br>
# <br># If you don't like your resolv.conf to be changed, you<br># can set MODIFY_{RESOLV,NAMED}_CONF_DYNAMICALLY=no. This<br># variables are placed in /etc/sysconfig/network/config.
<br># <br># You can also configure service dhcpcd not to modify it.<br># <br># If you don't like dhcpcd to change your nameserver<br># settings<br>
# then either set DHCLIENT_MODIFY_RESOLV_CONF=no<br># in /etc/sysconfig/network/dhcp, or<br># set MODIFY_RESOLV_CONF_DYNAMICALLY=no in<br># /etc/sysconfig/network/config or (manually) use dhcpcd
<br># with -R. If you only want to keep your searchlist, set<br># DHCLIENT_KEEP_SEARCHLIST=yes in /etc/sysconfig/network/dhcp or<br># (manually) use the -K option.<br>#<br>### END INFO
<br>search XXX.XXX.160.17 XXX.XXX.18 XXX.XXX.160.22 XXX.XXX.160.23<br>nameserver XXX.XXX.160.17<br>nameserver XXX.XXX.160.18<br>nameserver XXX.XXX.160.22<br>nameserver XXX.XXX.160.23<br>--- try to run /sbin/ifconfig -a<br>
eth0 Link encap:Ethernet HWaddr 00:18:8B:1E:1F:D6 <br> inet addr:XXX.XXX.123.136 Bcast:XXX.XXX.123.255 Mask:<a href="http://255.255.254.0">255.255.254.0</a><br> inet6 addr: fe80::218:8bff:fe1e:1fd6/64 Scope:Link
<br> UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1<br> RX packets:18300 errors:3 dropped:0 overruns:0 frame:4<br> TX packets:890 errors:0 dropped:0 overruns:0 carrier:0<br> collisions:0 txqueuelen:1000
<br> RX bytes:2217723 (2.1 Mb) TX bytes:134240 (131.0 Kb)<br> Interrupt:169 <br><br>lo Link encap:Local Loopback <br> inet addr:<a href="http://127.0.0.1">127.0.0.1</a> Mask:<a href="http://255.0.0.0">
255.0.0.0</a><br> inet6 addr: ::1/128 Scope:Host<br> UP LOOPBACK RUNNING MTU:16436 Metric:1<br> RX packets:166 errors:0 dropped:0 overruns:0 frame:0<br> TX packets:166 errors:0 dropped:0 overruns:0 carrier:0
<br> collisions:0 txqueuelen:0 <br> RX bytes:20311 (19.8 Kb) TX bytes:20311 (19.8 Kb)<br><br>sit0 Link encap:IPv6-in-IPv4 <br> NOARP MTU:1480 Metric:1<br> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
<br> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0<br> collisions:0 txqueuelen:0 <br> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)<br><br>--- try to print /etc/nsswitch.conf<br>#<br># /etc/nsswitch.conf
<br>#<br># An example Name Service Switch config file. This file should be<br># sorted with the most-used services at the beginning.<br>#<br># The entry '[NOTFOUND=return]' means that the search for an<br># entry should stop if the search in the previous entry turned
<br># up nothing. Note that if the search failed due to some other reason<br># (like no NIS server responding) then the search continues with the<br># next entry.<br>#<br># Legal entries are:<br>#<br># compat Use compatibility setup
<br># nisplus Use NIS+ (NIS version 3)<br># nis Use NIS (NIS version 2), also called YP<br># dns Use DNS (Domain Name Service)<br># files Use the local files
<br># [NOTFOUND=return] Stop searching if not found so far<br>#<br># For more information, please read the nsswitch.conf.5 manual page.<br>#<br><br># passwd: files nis<br># shadow: files nis<br># group: files nis
<br><br>passwd: compat<br>group: compat<br><br>hosts: files dns<br>networks: files dns<br><br>services: files<br>protocols: files<br>rpc: files<br>ethers: files<br>netmasks: files
<br>netgroup: files nis<br>publickey: files<br><br>bootparams: files<br>automount: files nis<br>aliases: files<br><br><br>administrador@server2:~> mpdcheck -f .mpd.hosts -ssh -v<br>obtaining hostname via gethostname and getfqdn
<br>gethostname gives server2<br>getfqdn gives server2<br>checking out unqualified hostname; make sure is not "localhost", etc.<br>checking out qualified hostname; make sure is not "localhost", etc.<br>
obtain IP addrs via qualified and unqualified hostnames; make sure other than <a href="http://127.0.0.1">127.0.0.1</a><br>gethostbyname_ex: ('server2', [], ['<a href="http://127.0.0.2">127.0.0.2</a>'])<br>
*** first ipaddr for this host (via server2) is: <a href="http://127.0.0.2">127.0.0.2</a><br>gethostbyname_ex: ('server2', [], ['<a href="http://127.0.0.2">127.0.0.2</a>'])<br>checking that IP addrs resolve to same host
<br>now do some gethostbyaddr and gethostbyname_ex for machines in hosts file<br>checking gethostbyXXX for unqualified server4<br>gethostbyname_ex: ('server4', [], ['XXX.XXX.123.25'])<br>checking gethostbyXXX for qualified server4
<br>gethostbyname_ex: ('server4', [], ['XXX.XXX.123.25'])<br>checking gethostbyXXX for unqualified server2<br>gethostbyname_ex: ('server2', [], ['<a href="http://127.0.0.2">127.0.0.2</a>'])
<br>checking gethostbyXXX for qualified server2<br>gethostbyname_ex: ('server2', [], ['<a href="http://127.0.0.2">127.0.0.2</a>'])<br>trying: ssh server4 -x -n /bin/echo hello<br>trying: ssh server2 -x -n /bin/echo hello
<br>starting server: /usr/local/bin/mpdcheck.py -s<br>starting client: ssh server4 -x -n /usr/local/bin/mpdcheck.py -c server2 25734<br>** timed out waiting for client on server4 to produce output<br>client on server4 failed to access the server
<br><br>after I try<br> administrador@server2:~> ssh server4 -x -n /bin/echo helloJorge<br><br>and the output are<br> helloJorge<br><br><br>administrador@server2:~> mpdboot -f .mpd.hosts -n 2<br>mpdboot_server2 (handle_mpd_output 383): failed to connect to mpd on server4
<br>administrador@server2:~> mpdboot -f .mpd.hosts -n 2 -v<br>running mpdallexit on server2<br>LAUNCHED mpd on server2 via <br>RUNNING: mpd on server2<br>LAUNCHED mpd on server4 via server2<br>mpdboot_server2 (handle_mpd_output 383): failed to connect to mpd on server4
<br><br><br>I dont know why failed the access :S<br><br>thanks<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">On Mon, 23 Jul 2007, Jorge Gonzalez wrote:
<br><br>> Hi all<br>><br>> I'm configuring a cluster of Two Pc using Suse 10.2 x64, Mpich2-1.0.5p4,<br>> OpenSSH_4.4p1<br>><br>> I had configured succesfully the ssh server on each machine.<br>> also I had configured the ssh clients with the command
<br>> ssh server1 (without password)<br>> ssh server2 (without password)<br>><br>> However when I tread to bring a ring of these two machines with the command<br>> mpdbood -n 2 -f .mpd.hosts<br>><br>> the following message is obtained are:
<br>> mpdboot_server1 (handle_mpd_output 383): failed to connect to mpd on server2<br>><br>> can somebody tell me what I am doing wrong?<br>><br>> the file .mpd.hosts contains the next two lines:<br>> server1
<br>> server2<br>><br>> I was to read this:<br>> <a href="http://www-unix.mcs.anl.gov/web-mail-archive/lists/mpich-discuss/2006/08/msg00009.html">http://www-unix.mcs.anl.gov/web-mail-archive/lists/mpich-discuss/2006/08/msg00009.html
</a><br>> <a href="http://www-unix.mcs.anl.gov/web-mail-archive/lists/mpich-discuss/2006/04/msg00037.html">http://www-unix.mcs.anl.gov/web-mail-archive/lists/mpich-discuss/2006/04/msg00037.html</a><br>><br>><br>> Thanks for all
<br>><br>> --<br>> Jorge Andres Gonzalez<br>> jag2kn (at) <a href="http://gmail.com">gmail.com</a><br>> jagonalezce (at) <a href="http://unal.edu.co">unal.edu.co</a><br>> Universidad Nacional de Colombia
<br>> Cel: 301 217 78 60<br>> Linux Counter 345082<br>> Bogotá - Colombia - Sur América<br>><br></blockquote></div><br><br clear="all"><br>-- <br>JAG<br>jag2kn (at) <a href="http://gmail.com">gmail.com
</a><br>Cel: 301 217 78 60<br>Linux Counter 345082<br>Bogotá - Colombia - Sur América