[mpich-discuss] error trying to connect to other nodes by mpdboot

Ariovaldo de Souza Junior ariovaldojunior at gmail.com
Wed Jul 2 15:52:04 CDT 2008


Thanks a lot Pavan and group, I could solve it now. running very smoothly.

Ari

2008/7/2 Ariovaldo de Souza Junior <ariovaldojunior at gmail.com>:

> Hello Pavan,
>
> The problem is. sometimes it connects perfectly, all are running I get all
> hosts with "mpdtrace" command. but now it started to return me this error.
>
> when I run "mpdcheck -s" it returns me "server listening at INADDR_ANY on:
> eagle 36277" but doesn't return to the command prompt. I use "ctrl + z" to
> return to the command line and then use the "mpdcheck -c eagle 36277" it
> also becomes still and nor prompt me errors nor show the message the manual
> says it should. Once again I have to press "ctr+z" to return.
>
> When I tried "mpdcheck -v" it returned me the following values:
>
> obtaining hostname via gethostname and getfqdn
> gethostname gives  eagle
> getfqdn gives  eagle
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames;  make sure other
> than 127.0.0.1
> gethostbyname_ex:  ('eagle', ['eagle'], ['192.168.1.101'])
> gethostbyname_ex:  ('eagle', ['eagle'], ['192.168.1.101'])
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts file
>
> My "etc/hosts" file in all computers is configured like this:
>
> 127.0.0.1 localhost
> 192.168.1.101 eagle eagle
> 192.168.1.1 falcon falcon
> 192.168.1.2 cheetah cheetah
> 192.168.1.3 coyote coyote
> 192.168.1.4 chacal chacal
> 192.168.1.5 lion lion
> 192.168.1.6 owl owl
> 192.168.1.7 puma puma
> 192.168.1.8 gepard gepard
>
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
>
> Do I have to configure a DNS sever on the master node? is that necessary?
>
> thanks a lot for your support.
>
> Ari
>
>
>
>
>
> 2008/7/2 Pavan Balaji <balaji at mcs.anl.gov>:
>
>
>> You can use the mpdcheck utility to see if there are any problems in your
>> network setup. The most common problem is the DNS or /etc/hosts.
>>
>> 1. Make sure the "hostname" on each node is the same as what /etc/hosts on
>> all the other nodes refer to it as.
>>
>> 2. Do an "/usr/bin/nslookup <hostname>" for each host in the system from
>> every other host and make sure it resolves to the correct address.
>>
>>  -- Pavan
>>
>>
>> Ariovaldo de Souza Junior wrote:
>>
>>> when I tried to run mpd on my nodes with the
>>>
>>> :: mpdboot -n 6 -f mpd.hosts
>>>
>>> command, I get the following error:
>>>
>>> :: mpdboot_eagle (handle_mpd_output 393): failed to handshake with mpd on
>>> owl; recvd output={}
>>>
>>> I had all set and the things worked well until I add some more nodes. but
>>> the configuration in all is equal (except for the /etc/localhost :: the
>>> network configuration). Can anyone who have mpich set and running send me a
>>> copy of the "/etc/hosts" file? And does anyone know what is the real problem
>>> that is causing this error? I'm working with 6 cpus core 2 quad and I still
>>> have to set two more.
>>>
>>> I have set and working in my server (and nodes)
>>>
>>> :: NFS, which is clonning all the files in my main folder
>>> :: passwordless ssh
>>>
>>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji <http://www.mcs.anl.gov/%7Ebalaji>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080702/15215530/attachment.htm>


More information about the mpich-discuss mailing list