[MPICH] mpdboot and mpdcheck problems

Zach Ponder zponder at nd.edu
Wed Aug 2 14:59:04 CDT 2006


I'm having some troubles getting Mpich2-1.0.3 up and running on a  
three computer setup, one master two computation nodes.  I've seen a  
mailing archive of someone that seemed to have a similar problem, and  
they were able to correct it in some manner.

http://www-unix.mcs.anl.gov/web-mail-archive/lists/mpich-discuss/ 
2006/04/msg00037.html

It seemed to be a problem with the mpd being addressed to 127.0.0.1.   
Not entirely sure if I'm in the same situation, but I am stuck on how  
to fix it.  I'm afraid that it is some sort of simple networking  
issue, but since this is my first venture into cluster computing  
everything is posing a challenge.

Things I'm able to do or have done:

	ping between boxes
	ssh between boxes without password
	bring up an mpd on each box
	made the changes to mpd.py (commented two lines)
	
Things I'm unable to do:

	use mpdboot to bring up a ring of mpds
	manually start a server/client mpd on two machines(gives error along  
lines of unable to ping)

I don't receive any errors when running mpdcheck, but not the case  
when I run mpdcheck -f ~/Desktop/mpd.hosts -ssh

[cobalt at bhead home]$ mpdcheck -f ~/Desktop/mpd.hosts -ssh
** timed out waiting for client on b1.aero.nd.edu to produce output
client on b1.aero.nd.edu failed to access the server
here is the output:
Traceback (most recent call last):
   File "/home/cobalt/mpich2-install/bin/mpdcheck.py", line 103, in ?
     sock.connect((argv[argidx+1],int(argv[argidx+2])))  # note  
double parens
   File "<string>", line 1, in connect
socket.error: (113, 'No route to host')

And here is the output from mpdcheck -pc:

[cobalt at bhead home]$ mpdcheck -pc
--- print results of: gethostbyname_ex(gethostname())
('bhead.aero.nd.edu', ['bhead'], ['192.168.2.1'])
--- try to run /bin/hostname
bhead.aero.nd.edu
--- try to run uname -a
Linux bhead.aero.nd.edu 2.6.9-34.EL #1 Mon Mar 13 11:31:17 CST 2006  
i686 i686 i386 GNU/Linux
--- try to print /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
192.168.2.102   b2.aero.nd.edu  b2
192.168.2.101   b1.aero.nd.edu  b1
192.168.2.1     bhead.aero.nd.edu       bhead
--- try to print /etc/resolv.conf
; generated by /sbin/dhclient-script
search aero.nd.edu
nameserver 192.168.2.1
--- try to run /sbin/ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:11:11:95:8F:63
           inet addr:192.168.2.1  Bcast:192.168.2.255  Mask: 
255.255.255.0
           inet6 addr: fe80::211:11ff:fe95:8f63/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:263 errors:0 dropped:0 overruns:0 frame:0
           TX packets:293 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:40718 (39.7 KiB)  TX bytes:39246 (38.3 KiB)

lo        Link encap:Local Loopback
           inet addr:127.0.0.1  Mask:255.0.0.0
           inet6 addr: ::1/128 Scope:Host
           UP LOOPBACK RUNNING  MTU:16436  Metric:1
           RX packets:1475 errors:0 dropped:0 overruns:0 frame:0
           TX packets:1475 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0
           RX bytes:2939426 (2.8 MiB)  TX bytes:2939426 (2.8 MiB)

sit0      Link encap:IPv6-in-IPv4
           NOARP  MTU:1480  Metric:1
           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0
           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

--- try to print /etc/nsswitch.conf
#
# /etc/nsswitch.conf
#
# An example Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# The entry '[NOTFOUND=return]' means that the search for an
# entry should stop if the search in the previous entry turned
# up nothing. Note that if the search failed due to some other reason
# (like no NIS server responding) then the search continues with the
# next entry.
#
# Legal entries are:
#
#       nis or yp               Use NIS (NIS version 2), also called YP
#       dns                     Use DNS (Domain Name Service)
#       files                   Use the local files
#       db                      Use the local database (.db) files
#       compat                  Use NIS on compat mode
#       hesiod                  Use Hesiod for user lookups
#       ldap                    Use LDAP (only if nss_ldap is installed)
#       nisplus or nis+         Use NIS+ (NIS version 3), unsupported
#       [NOTFOUND=return]       Stop searching if not found so far
#

# To use db, put the "db" in front of "files" for entries you want to be
# looked up first in the databases
#
# Example:
#passwd:    db files ldap nis
#shadow:    db files ldap nis
#group:     db files ldap nis

passwd:     files
shadow:     files
group:      files

#hosts:     db files ldap nis dns
hosts:      files dns

# Example - obey only what ldap tells us...
#services:  ldap [NOTFOUND=return] files
#networks:  ldap [NOTFOUND=return] files
#protocols: ldap [NOTFOUND=return] files
#rpc:       ldap [NOTFOUND=return] files
#ethers:    ldap [NOTFOUND=return] files

bootparams: files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files
netgroup:   files
publickey:  files
automount:  files
aliases:    files
[cobalt at bhead home]$


Thanks for your attention,

Zach Ponder
Graduate Student
University of Notre Dame
Department of Aerospace and Mechanical Engineering
zponder at nd.edu


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20060802/2ba6361d/attachment.htm>


More information about the mpich-discuss mailing list