[mpich-discuss] On gethostbyname() and related issues on Macs

Nicolas Rosner nrosner at gmail.com
Mon Feb 28 21:35:19 CST 2011


Dear MPICH developers,

In hydra/utils/sock/sock.c, if gethostbyname() yields NULL, errno is
checked. However, according to both BSD and Darwin manpages*, such
code should be checking h_errno. This was recently addressed (r8059),
but it looks like only the first appearance (out of two in sock.c; see
l.169 vs l.515) was fixed, wasn't it?

On Mac OS X, as of 1.3.3rc1, the remaining one still seems to be
causing Hydra to report misleading reasons (`no such file or
directory') for failed lookups.

This shouldn't cause (per se) any new hostname-related problems, but
it could make preexisting ones harder to track down, esp. on a
platform where resolution mechanisms can get difficult to follow in
the first place.

Now, a few comments about the more specific problem reported last week
by Alex T.:

   $ mpiexec -n 2 ./a.out
   HYDU_sock_is_local (./utils/sock/sock.c:483):
    unable to get host address (No such file or directory)
   main (./ui/mpich/mpiexec.c:343): unable to check if lelos is local

Tried to reproduce that in many ways, incl. private/public IPs,
combinations of the Mac-specific related prefs, with/without custom
/etc/hosts, domain names and hostnames ranging from real to fake to
borderline (e.g. `100') -- Hydra's guess of a reasonable,
non-`localhost' hostname for local use worked fine every time.

However, along the way I did note that:

a) it suffices to feed Hydra an invalid target hostname (via -f or
-hosts, but also via env var, right?) to reproduce the exact same
errors**,

  [mpiexec at fiona] HYDU_sock_is_local (./utils/sock/sock.c:515):
    unable to get host address (No such file or directory)
  [mpiexec at fiona] main (./ui/mpich/mpiexec.c:344):
    unable to check if fiona is local

b) it's fairly easy, on a current Mac, to end up with a Computer
and/or Bonjour name (like `lelos' or `fiona') that looks and feels
like a valid hostname just about everywhere, yet means nothing to
gethostbyname() and friends.

It would explain the whole thing if Alex had somehow induced the name
`lelos' as Hydra's pick. Otherwise the mystery would, I guess, boil
down to why and how Hydra could, given no user hint, come up on its
own with a name like `lelos', well-known by the GUI layer but
unresolvable by plain old BSD methods.

Best regards,
Nicolás




* (which also recommend replacing the call altogether, but a few
comments in the source code suggest that you probably have your
reasons not to do so just yet)

** (line numbers differ here, but only because sample output's from a
1.3.3rc1 build)


$ grep -n -A3 gethostbyname src/pm/hydra/utils/sock/sock.c
166:    ht = gethostbyname(host);
167-    if (ht == NULL)
168-        HYDU_ERR_SETANDJUMP(status, HYD_INVALID_PARAM,
169-          "unable to get host address (%s)\n", HYDU_herror(h_errno));
--
512:    ht = gethostbyname(host);
513-    if (ht == NULL)
514-        HYDU_ERR_SETANDJUMP(status, HYD_INVALID_PARAM,
515-          "unable to get host address (%s)\n", HYDU_strerror(errno));


More information about the mpich-discuss mailing list