[mpich-discuss] Problem: *** gethostbyname_ex failed
Pavan Balaji
balaji at mcs.anl.gov
Wed Jan 13 15:51:00 CST 2010
This looks like a networking setup issue. The MPICH2 installation seems
to have completed fine. Can you check your /etc/hosts files on all
machines to make sure they have the correct entries? That is, each node
should be able to identify the IP address of the name returned by
"hostname" on every other node.
Alternatively, you can try using the hydra process manager using the
executable mpiexec.hydra instead of mpiexec. More information is
available here:
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
-- Pavan
On 01/13/2010 03:38 PM, Brooks Garrison wrote:
> Hello all,
>
> First let me preemptively say thank you for any and all help that can
> be provided, as I had no where else to look really.
>
> I've just started re-learning about MPI (had a class in college that
> used it) and am trying to set MPICH2 on my local machine to try and
> compile some example code (I don't have my Linux cluster set up just
> yet) but I wanted to get familiar with the set up process before hand.
> I've been working through the installer's guide, Quick Start Section,
> and have come across a snag that my limited experience with Linux
> cannot solve.
>
> I've completed the following:
>
> 1. Unpacked the tar file to /home/bgarrison/libraries
> 2. Created an install directory at /home/bgarrison/mpich2-install
> 3. Created a build directory at /home/bgarrison/mpich2-build
> 4. CD'd to the build directory and from there run the command:
> /home/bgarrison/libraries/mpich2-1.2.1/configure
> --prefix=/home/bgarrison/mpich2-install 2>&1 | tee c.txt
> which produced the attached c.txt output that I checked for
> errors and found none.
> 5. I then ran the command:
> make VERBOSE=1 2>&1 | tee m.txt (I also ran it without the
> VERBOSE option but thought I'd run it anyway to have the most
> information)
> which produced the attached m.txt output that I checked for
> errors and found none.
> 6. I then ran the command:
> make install 2>&1 | tee mi.txt
> which produced the attached mi.txt output that I checked for
> errors and found none.
> 7. I then appended the PATH environment variable with
> /home/bgarrison/mpich2-install/bin
> 8. I then checked to make sure everything was in order with the
> following commands:
> which mpd - > /home/bgarrison/mpich2-install/bin/mpd
> which mpicc - > /home/bgarrison/mpich2-install/bin/mpicc
> which mpiexec - > /home/bgarrison/mpich2-install/bin/mpiexec
> which mpirun- > /home/bgarrison/mpich2-install/bin/mpd
> 9. I created the .mpd.conf file and added the secretword to it and
> changed it's permissions.
>
> Then when I tried to perform the first 'sanity check' as soon as I
> typed "mpd &" I got the message:
>
> bgarrison at STK:~$ mpd &
> mpd failed: gethostbyname_ex failed for STK
>
> I then looked at Appendix::A for trouble shooting advice:
>
> "If you can ssh from each machine to itself, and from each machine to each
> other machine in your set (and back), then you probably have an adequate
> environment for mpd" ~ From Installation Guide
>
> I'm able to SSH from my machine to itself (and I currently do not have
> the other machines in yet, so I'm unsure about the rest of it), but
> I'm still getting errors. I checked to make sure that there were no
> mpd were running as the guide suggests and then ran:
>
> mpdcheck -s
>
> And I get the result:
>
> server listening at INADDR_ANY on: STK 49252
>
> Which is an output that I should expect. However, when I run the client using:
>
> mpdcheck -c STK 49252
>
> I get this result in the client terminal window:
>
> Traceback (most recent call last):
> File "/home/bgarrison/mpich2-install/bin/mpdcheck", line 109, in <module>
> sock.connect((argv[argidx+1],int(argv[argidx+2]))) # note double parens
> File "<string>", line 1, in connect
> socket.gaierror: (-5, 'No address associated with hostname')
>
> Which is -not- an output I expect. So then I ran:
>
> mpdcheck -v
>
> Result:
> obtaining hostname via gethostname and getfqdn
> gethostname gives STK
> getfqdn gives STK
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames; make sure
> other than 127.0.0.1
> *** gethostbyname_ex failed for this host STK
> *** gethostbyname_ex failed for host STK
> checking that IP addrs resolve to same host
>
> And:
>
> mpdcheck -l
>
> Result:
> **********
> The system call gethostbyname(3) failed to resolve your
> unqualified hostname, or $uqhn. This can be caused by
> missing info from your /etc/hosts file or your system not
> having correctly configured name resolvers, or by your IP
> address not existing in resolution services.
> If you run DNS, you may wish to make sure that your
> DNS server has the correct forward A set up for your machine's
> hostname. If you are not using DNS and are only using hosts
> files, please check that a line similar to the one below exists
> in your /etc/hosts file:
> $ipaddr $uqdn
> If you plan to use DNS but you are not sure that it is
> correctly configured, please check that the file /etc/resolv.conf
> contains entries similar to the following:
> nameserver 1.2.3.4
> where 1.2.3.4 is an actual IP of one of your nameservers.
> **********
>
>
> **********
> The system call gethostbyname(3) failed to resolve your
> fully qualified hostname, or $fqhn. This can be caused by
> missing info from your /etc/hosts file or your system not
> having correctly configured name resolvers, or by your IP
> address not existing in resolution services.
> If you run DNS, please check and make sure that your
> DNS server has the correct forward A record set up for your
> machine's hostname. If you are not using DNS and are only using
> hosts files, please check that a line similar to the one below
> exists in your /etc/hosts file:
> $ipaddr $fqhn
> If you intend to use DNS but you are not sure that it is
> correctly configured, please check that the file /etc/resolv.conf
> contains entries similar to the following:
> nameserver 1.2.3.4
> where 1.2.3.4 is an actual IP of one of your nameservers.
> **********
> I am at a loss for what to do. Is this a networking problem or is did
> I mess something up in the download/install process? :(
>
> Thanks for any and all help,
>
> Brooks
>
>
>
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts file
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list