[mpich-discuss] error trying to connect to other nodes by mpdboot

Pavan Balaji balaji at mcs.anl.gov
Wed Jul 2 12:25:09 CDT 2008


You can use the mpdcheck utility to see if there are any problems in 
your network setup. The most common problem is the DNS or /etc/hosts.

1. Make sure the "hostname" on each node is the same as what /etc/hosts 
on all the other nodes refer to it as.

2. Do an "/usr/bin/nslookup <hostname>" for each host in the system from 
every other host and make sure it resolves to the correct address.

  -- Pavan

Ariovaldo de Souza Junior wrote:
> when I tried to run mpd on my nodes with the
> 
> :: mpdboot -n 6 -f mpd.hosts
> 
> command, I get the following error:
> 
> :: mpdboot_eagle (handle_mpd_output 393): failed to handshake with mpd 
> on owl; recvd output={}
> 
> I had all set and the things worked well until I add some more nodes. 
> but the configuration in all is equal (except for the /etc/localhost :: 
> the network configuration). Can anyone who have mpich set and running 
> send me a copy of the "/etc/hosts" file? And does anyone know what is 
> the real problem that is causing this error? I'm working with 6 cpus 
> core 2 quad and I still have to set two more.
> 
> I have set and working in my server (and nodes)
> 
> :: NFS, which is clonning all the files in my main folder
> :: passwordless ssh
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji




More information about the mpich-discuss mailing list