[mpich-discuss] error trying to connect to other nodes by mpdboot

Ariovaldo de Souza Junior ariovaldojunior at gmail.com
Wed Jul 2 13:57:49 CDT 2008


Hello Pavan,

The problem is. sometimes it connects perfectly, all are running I get all
hosts with "mpdtrace" command. but now it started to return me this error.

when I run "mpdcheck -s" it returns me "server listening at INADDR_ANY on:
eagle 36277" but doesn't return to the command prompt. I use "ctrl + z" to
return to the command line and then use the "mpdcheck -c eagle 36277" it
also becomes still and nor prompt me errors nor show the message the manual
says it should. Once again I have to press "ctr+z" to return.

When I tried "mpdcheck -v" it returned me the following values:

obtaining hostname via gethostname and getfqdn
gethostname gives  eagle
getfqdn gives  eagle
checking out unqualified hostname; make sure is not "localhost", etc.
checking out qualified hostname; make sure is not "localhost", etc.
obtain IP addrs via qualified and unqualified hostnames;  make sure other
than 127.0.0.1
gethostbyname_ex:  ('eagle', ['eagle'], ['192.168.1.101'])
gethostbyname_ex:  ('eagle', ['eagle'], ['192.168.1.101'])
checking that IP addrs resolve to same host
now do some gethostbyaddr and gethostbyname_ex for machines in hosts file

My "etc/hosts" file in all computers is configured like this:

127.0.0.1 localhost
192.168.1.101 eagle eagle
192.168.1.1 falcon falcon
192.168.1.2 cheetah cheetah
192.168.1.3 coyote coyote
192.168.1.4 chacal chacal
192.168.1.5 lion lion
192.168.1.6 owl owl
192.168.1.7 puma puma
192.168.1.8 gepard gepard

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Do I have to configure a DNS sever on the master node? is that necessary?

thanks a lot for your support.

Ari





2008/7/2 Pavan Balaji <balaji at mcs.anl.gov>:

>
> You can use the mpdcheck utility to see if there are any problems in your
> network setup. The most common problem is the DNS or /etc/hosts.
>
> 1. Make sure the "hostname" on each node is the same as what /etc/hosts on
> all the other nodes refer to it as.
>
> 2. Do an "/usr/bin/nslookup <hostname>" for each host in the system from
> every other host and make sure it resolves to the correct address.
>
>  -- Pavan
>
>
> Ariovaldo de Souza Junior wrote:
>
>> when I tried to run mpd on my nodes with the
>>
>> :: mpdboot -n 6 -f mpd.hosts
>>
>> command, I get the following error:
>>
>> :: mpdboot_eagle (handle_mpd_output 393): failed to handshake with mpd on
>> owl; recvd output={}
>>
>> I had all set and the things worked well until I add some more nodes. but
>> the configuration in all is equal (except for the /etc/localhost :: the
>> network configuration). Can anyone who have mpich set and running send me a
>> copy of the "/etc/hosts" file? And does anyone know what is the real problem
>> that is causing this error? I'm working with 6 cpus core 2 quad and I still
>> have to set two more.
>>
>> I have set and working in my server (and nodes)
>>
>> :: NFS, which is clonning all the files in my main folder
>> :: passwordless ssh
>>
>>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080702/d85e5926/attachment.htm>


More information about the mpich-discuss mailing list