Thanks a lot Pavan and group, I could solve it now. running very smoothly.<br><br>Ari<br><br><div class="gmail_quote">2008/7/2 Ariovaldo de Souza Junior <<a href="mailto:ariovaldojunior@gmail.com">ariovaldojunior@gmail.com</a>>:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hello Pavan,<br><br>The problem is. sometimes it connects perfectly, all are running I get all hosts with "mpdtrace" command. but now it started to return me this error.<br>
<br>when I run "mpdcheck -s" it returns me "server listening at INADDR_ANY on: eagle 36277" but doesn't return to the command prompt. I use "ctrl + z" to return to the command line and then use the "mpdcheck -c eagle 36277" it also becomes still and nor prompt me errors nor show the message the manual says it should. Once again I have to press "ctr+z" to return.<br>
<br>When I tried "mpdcheck -v" it returned me the following values:<br><br>obtaining hostname via gethostname and getfqdn<br>gethostname gives eagle<br>getfqdn gives eagle<br>checking out unqualified hostname; make sure is not "localhost", etc.<br>
checking out qualified hostname; make sure is not "localhost", etc.<br>obtain IP addrs via qualified and unqualified hostnames; make sure other than <a href="http://127.0.0.1" target="_blank">127.0.0.1</a><br>
gethostbyname_ex: ('eagle', ['eagle'], ['<a href="http://192.168.1.101" target="_blank">192.168.1.101</a>'])<br>
gethostbyname_ex: ('eagle', ['eagle'], ['<a href="http://192.168.1.101" target="_blank">192.168.1.101</a>'])<br>checking that IP addrs resolve to same host<br>now do some gethostbyaddr and gethostbyname_ex for machines in hosts file<br>
<br>My "etc/hosts" file in all computers is configured like this:<br><br><a href="http://127.0.0.1" target="_blank">127.0.0.1</a> localhost<br><a href="http://192.168.1.101" target="_blank">192.168.1.101</a> eagle eagle<br>
<a href="http://192.168.1.1" target="_blank">192.168.1.1</a> falcon falcon<br>
<a href="http://192.168.1.2" target="_blank">192.168.1.2</a> cheetah cheetah<br><a href="http://192.168.1.3" target="_blank">192.168.1.3</a> coyote coyote<br><a href="http://192.168.1.4" target="_blank">192.168.1.4</a> chacal chacal<br>
<a href="http://192.168.1.5" target="_blank">192.168.1.5</a> lion lion<br>
<a href="http://192.168.1.6" target="_blank">192.168.1.6</a> owl owl<br><a href="http://192.168.1.7" target="_blank">192.168.1.7</a> puma puma<br><a href="http://192.168.1.8" target="_blank">192.168.1.8</a> gepard gepard<br>
<br># The following lines are desirable for IPv6 capable hosts<br>
::1 ip6-localhost ip6-loopback<br>fe00::0 ip6-localnet<br>ff00::0 ip6-mcastprefix<br>ff02::1 ip6-allnodes<br>ff02::2 ip6-allrouters<br>ff02::3 ip6-allhosts<br><br>Do I have to configure a DNS sever on the master node? is that necessary?<br>
<br>thanks a lot for your support.<br><br>Ari<br><br><br><br><br><br><div class="gmail_quote">2008/7/2 Pavan Balaji <<a href="mailto:balaji@mcs.anl.gov" target="_blank">balaji@mcs.anl.gov</a>>:<div><div></div><div class="Wj3C7c">
<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
You can use the mpdcheck utility to see if there are any problems in your network setup. The most common problem is the DNS or /etc/hosts.<br>
<br>
1. Make sure the "hostname" on each node is the same as what /etc/hosts on all the other nodes refer to it as.<br>
<br>
2. Do an "/usr/bin/nslookup <hostname>" for each host in the system from every other host and make sure it resolves to the correct address.<br>
<br>
-- Pavan<div><div></div><div><br>
<br>
Ariovaldo de Souza Junior wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
when I tried to run mpd on my nodes with the<br>
<br>
:: mpdboot -n 6 -f mpd.hosts<br>
<br>
command, I get the following error:<br>
<br>
:: mpdboot_eagle (handle_mpd_output 393): failed to handshake with mpd on owl; recvd output={}<br>
<br>
I had all set and the things worked well until I add some more nodes. but the configuration in all is equal (except for the /etc/localhost :: the network configuration). Can anyone who have mpich set and running send me a copy of the "/etc/hosts" file? And does anyone know what is the real problem that is causing this error? I'm working with 6 cpus core 2 quad and I still have to set two more.<br>
<br>
I have set and working in my server (and nodes)<br>
<br>
:: NFS, which is clonning all the files in my main folder<br>
:: passwordless ssh<br>
<br>
</blockquote>
<br></div></div><font color="#888888">
-- <br>
Pavan Balaji<br>
<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>
<br>
</font></blockquote></div></div></div><br>
</blockquote></div><br>