[MPICH] MPICH2 cluster, Local Network
Samuel Winchenbach
swinchen at eece.maine.edu
Tue Mar 21 18:35:28 CST 2006
Hello all,
I have been trying to set up a small cluster for a school project and I
almost had it working. When I first started I had two computers that
had fully qualified domain names - everything worked fine. Next I tried
adding 6 more computers and networking them together with a gigabit
switch. At this point all the computers had 192.168.2.X addresses and
all of them had the hostname "localhost.localdomain"
for a small test I started with 2 nodes.
I configured the headnode to export "/home/cluster" and the compute node
to mount it. After setting up SSH everything was going smoothly... I
could SSH from the head node to the compute node without a password, and
NFS seemed to be working fine.
Now, just to make things more clear I set the hostname of the head node
to "node0" and the hostname of the compute node to "node3" and I
modified the hosts file and added the IP-name mapping. I added "node3"
to the mpd.hosts file, and gave it a try.
[cluster at node0 ~]$ mpdboot -n 2
mpdboot_node0 (handle_mpd_output 359): failed to ping mpd on node3;
recvd output={}
That is no good. But I can access the computer:
[cluster at node0 ~]$ ping node3
PING node3 (192.168.2.3) 56(84) bytes of data.
64 bytes from node3 (192.168.2.3): icmp_seq=0 ttl=64 time=0.313 ms
And mpd loads fine on the head node:
[cluster at node0 ~]$ mpdboot
[cluster at node0 ~]$ mpdtrace
node0
Alright, at this point I guess I need to ask if anyone has any ideas?
I am in bad need of some help. I thought I would be able to figure it
out but the deadline for the project is quickly approaching.
Here are the config files I needed to modify on the head node:
/etc/exports:
/home/cluster 192.168.2.0/255.255.255.0(rw)
/etc/hosts.allow:
portmap: 192.168.2.
lockd: 192.168.2.
rquotad: 192.168.2.
mountd: 192.168.2.
statd: 192.168.2.
/etc/hosts.deny:
portmap: ALL
lockd: ALL
mountd: ALL
rquotad: ALL
statd: ALL
/etc/hosts:
127.0.0.1 localhost.localdomain localhost node0
192.168.2.3 node3
/etc/sysconfig/network:
NETWORKING=yes
HOSTNAME=node0
/home/cluster/mpd.hosts:
node0
node3
/home/cluster/.mpd.conf:
MPD_SECRETWORD=dynamite
On the compute node I really only needed to modify the hosts and network
file, along with adding the following line to the fstab file:
192.168.2.4:/home/cluster /home/cluster nfs rw,hard,intr 0 0
I guess that is it. Thanks for any help you might be able to give me.
Sam
More information about the mpich-discuss
mailing list