[MPICH] MPICH2 cluster, Local Network

Samuel Winchenbach swinchen at eece.maine.edu
Tue Mar 21 18:35:28 CST 2006


Hello all,

I have been trying to set up a small cluster for a school project and I 
almost had it working.  When I first started I had two computers that 
had fully qualified domain names - everything worked fine.  Next I tried 
adding 6 more computers and networking them together with a gigabit 
switch.  At this point all the computers had 192.168.2.X addresses and 
all of them had the hostname "localhost.localdomain"

for a small test I started with 2 nodes.
I configured the headnode to export "/home/cluster" and the compute node 
to mount it.  After setting up SSH everything was going smoothly... I 
could SSH from the head node to the compute node without a password, and 
NFS seemed to be working fine.

Now, just to make things more clear I set the hostname of the head node 
to "node0" and the hostname of the compute node to "node3" and I 
modified the hosts file and added the IP-name mapping.  I added "node3" 
to the mpd.hosts file, and gave it a try.

[cluster at node0 ~]$ mpdboot -n 2
mpdboot_node0 (handle_mpd_output 359): failed to ping mpd on node3; 
recvd output={}

That is no good.  But I can access the computer:

[cluster at node0 ~]$ ping node3
PING node3 (192.168.2.3) 56(84) bytes of data.
64 bytes from node3 (192.168.2.3): icmp_seq=0 ttl=64 time=0.313 ms


And mpd loads fine on the head node:
[cluster at node0 ~]$ mpdboot
[cluster at node0 ~]$ mpdtrace
node0

Alright, at this point I guess I need to ask if anyone has any ideas?   
I am in bad need of some help.  I thought I would be able to figure it 
out but the deadline for the project is quickly approaching.

Here are the config files I needed to modify on the head node:

/etc/exports:
/home/cluster 192.168.2.0/255.255.255.0(rw)

/etc/hosts.allow:
portmap: 192.168.2.
lockd: 192.168.2.
rquotad: 192.168.2.
mountd: 192.168.2.
statd: 192.168.2.

/etc/hosts.deny:
portmap: ALL
lockd: ALL
mountd: ALL
rquotad: ALL
statd: ALL

/etc/hosts:
127.0.0.1               localhost.localdomain localhost node0
192.168.2.3 node3

/etc/sysconfig/network:
NETWORKING=yes
HOSTNAME=node0

/home/cluster/mpd.hosts:
node0
node3

/home/cluster/.mpd.conf:
MPD_SECRETWORD=dynamite

On the compute node I really only needed to modify the hosts and network 
file, along with adding the following line to the fstab file:
192.168.2.4:/home/cluster       /home/cluster   nfs     rw,hard,intr    0 0

I guess that is it.  Thanks for any help you might be able to give me.

Sam





More information about the mpich-discuss mailing list