[mpich-discuss] On gethostbyname() and related issues on Macs
Nicolas Rosner
nrosner at gmail.com
Mon Feb 28 21:35:19 CST 2011
Dear MPICH developers,
In hydra/utils/sock/sock.c, if gethostbyname() yields NULL, errno is
checked. However, according to both BSD and Darwin manpages*, such
code should be checking h_errno. This was recently addressed (r8059),
but it looks like only the first appearance (out of two in sock.c; see
l.169 vs l.515) was fixed, wasn't it?
On Mac OS X, as of 1.3.3rc1, the remaining one still seems to be
causing Hydra to report misleading reasons (`no such file or
directory') for failed lookups.
This shouldn't cause (per se) any new hostname-related problems, but
it could make preexisting ones harder to track down, esp. on a
platform where resolution mechanisms can get difficult to follow in
the first place.
Now, a few comments about the more specific problem reported last week
by Alex T.:
$ mpiexec -n 2 ./a.out
HYDU_sock_is_local (./utils/sock/sock.c:483):
unable to get host address (No such file or directory)
main (./ui/mpich/mpiexec.c:343): unable to check if lelos is local
Tried to reproduce that in many ways, incl. private/public IPs,
combinations of the Mac-specific related prefs, with/without custom
/etc/hosts, domain names and hostnames ranging from real to fake to
borderline (e.g. `100') -- Hydra's guess of a reasonable,
non-`localhost' hostname for local use worked fine every time.
However, along the way I did note that:
a) it suffices to feed Hydra an invalid target hostname (via -f or
-hosts, but also via env var, right?) to reproduce the exact same
errors**,
[mpiexec at fiona] HYDU_sock_is_local (./utils/sock/sock.c:515):
unable to get host address (No such file or directory)
[mpiexec at fiona] main (./ui/mpich/mpiexec.c:344):
unable to check if fiona is local
b) it's fairly easy, on a current Mac, to end up with a Computer
and/or Bonjour name (like `lelos' or `fiona') that looks and feels
like a valid hostname just about everywhere, yet means nothing to
gethostbyname() and friends.
It would explain the whole thing if Alex had somehow induced the name
`lelos' as Hydra's pick. Otherwise the mystery would, I guess, boil
down to why and how Hydra could, given no user hint, come up on its
own with a name like `lelos', well-known by the GUI layer but
unresolvable by plain old BSD methods.
Best regards,
Nicolás
* (which also recommend replacing the call altogether, but a few
comments in the source code suggest that you probably have your
reasons not to do so just yet)
** (line numbers differ here, but only because sample output's from a
1.3.3rc1 build)
$ grep -n -A3 gethostbyname src/pm/hydra/utils/sock/sock.c
166: ht = gethostbyname(host);
167- if (ht == NULL)
168- HYDU_ERR_SETANDJUMP(status, HYD_INVALID_PARAM,
169- "unable to get host address (%s)\n", HYDU_herror(h_errno));
--
512: ht = gethostbyname(host);
513- if (ht == NULL)
514- HYDU_ERR_SETANDJUMP(status, HYD_INVALID_PARAM,
515- "unable to get host address (%s)\n", HYDU_strerror(errno));
More information about the mpich-discuss
mailing list