[mpich-discuss] error trying to connect to other nodes by mpdboot
Pavan Balaji
balaji at mcs.anl.gov
Wed Jul 2 12:25:09 CDT 2008
You can use the mpdcheck utility to see if there are any problems in
your network setup. The most common problem is the DNS or /etc/hosts.
1. Make sure the "hostname" on each node is the same as what /etc/hosts
on all the other nodes refer to it as.
2. Do an "/usr/bin/nslookup <hostname>" for each host in the system from
every other host and make sure it resolves to the correct address.
-- Pavan
Ariovaldo de Souza Junior wrote:
> when I tried to run mpd on my nodes with the
>
> :: mpdboot -n 6 -f mpd.hosts
>
> command, I get the following error:
>
> :: mpdboot_eagle (handle_mpd_output 393): failed to handshake with mpd
> on owl; recvd output={}
>
> I had all set and the things worked well until I add some more nodes.
> but the configuration in all is equal (except for the /etc/localhost ::
> the network configuration). Can anyone who have mpich set and running
> send me a copy of the "/etc/hosts" file? And does anyone know what is
> the real problem that is causing this error? I'm working with 6 cpus
> core 2 quad and I still have to set two more.
>
> I have set and working in my server (and nodes)
>
> :: NFS, which is clonning all the files in my main folder
> :: passwordless ssh
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list