[mpich-discuss] Error in the mpdboot step

Prashantha Hebbar prashantha.hebbar at manipal.edu
Wed Feb 3 01:02:14 CST 2010


Hi Pavan,

As you have suggested, I have added other host name in both the systems
/etc/hosts. But did not work. Here is the details of mpdcheck command

mlscub1 at mlscub1-desktop:~$ cat /etc/hosts
127.0.0.1       localhost
172.16.17.24    mlscub1-desktop
172.16.17.93    mlscub2-desktop
# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

=====================================

mlscub1 at mlscub1-desktop:~$ mpdcheck -v
obtaining hostname via gethostname and getfqdn
gethostname gives  mlscub1-desktop
getfqdn gives  mlscub1-desktop
checking out unqualified hostname; make sure is not "localhost", etc.
checking out qualified hostname; make sure is not "localhost", etc.
obtain IP addrs via qualified and unqualified hostnames;  make sure other
than 127.0.0.1
gethostbyname_ex:  ('mlscub1-desktop', [], ['172.16.17.24'])
gethostbyname_ex:  ('mlscub1-desktop', [], ['172.16.17.24'])
checking that IP addrs resolve to same host
now do some gethostbyaddr and gethostbyname_ex for machines in hosts file

======================================
mlscub1 at mlscub1-desktop:~$ mpdcheck -f mpd.hosts -ssh
*** gethostbyname_ex failed for host mlscub1 at mlscub1-desktop.local
*** gethostbyname_ex failed for host mlscub1 at mlscub1-desktop.local
*** gethostbyname_ex failed for host mlscub2 at mlscub2-desktop.local
*** gethostbyname_ex failed for host mlscub2 at mlscub2-desktop.local
** ssh failed to mlscub1 at mlscub1-desktop.local
** here is the output:
Permission denied (publickey).
=============================================================
But I can log in with password to both the systems. Here is the output for
that.
==========================================================================
mlscub1 at mlscub1-desktop:~$ ssh mlscub2 at mlscub2-desktop.local
Linux mlscub2-desktop 2.6.24-16-generic #1 SMP Thu Apr 10 13:23:42 UTC 2008
i686

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To access official Ubuntu documentation, please visit:
http://help.ubuntu.com/
Last login: Wed Feb  3 10:34:03 2010 from 172.16.17.98
=================================================================
mlscub2 at mlscub2-desktop:~$ mpdcheck -pc
--- print results of: gethostbyname_ex(gethostname())
('mlscub2-desktop', [], ['172.16.17.93'])
--- try to run /bin/hostname
mlscub2-desktop
--- try to run uname -a
Linux mlscub2-desktop 2.6.24-16-generic #1 SMP Thu Apr 10 13:23:42 UTC 2008
i686 GNU/Linux
--- try to print /etc/hosts
127.0.0.1       localhost
172.16.17.93    mlscub2-desktop
172.16.17.24    mlscub1-desktop
# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
--- try to print /etc/resolv.conf
search mahe.manipal.net
nameserver 172.16.19.202
nameserver 172.16.19.203
--- try to run /sbin/ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:e0:81:5e:29:ac
          inet addr:172.16.17.93  Bcast:172.16.17.255  Mask:255.255.255.0
          inet6 addr: fe80::2e0:81ff:fe5e:29ac/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:102648 errors:0 dropped:0 overruns:0 frame:0
          TX packets:64731 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:30758211 (29.3 MB)  TX bytes:8244158 (7.8 MB)
          Interrupt:16 Base address:0xc000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2566 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2566 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:144688 (141.2 KB)  TX bytes:144688 (141.2 KB)

--- try to print /etc/nsswitch.conf
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd:         compat
group:          compat
shadow:         compat

hosts:          files mdns4_minimal [NOTFOUND=return] dns mdns4
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

With this output, is it possible to find the problem?

Regards,
Prashantha

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Pavan Balaji
Sent: Tuesday, February 02, 2010 7:14 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Error in the mpdboot step


On 02/02/2010 04:52 AM, Prashantha Hebbar wrote:
> mlscub1 at mlscub1-desktop:~$ mpdboot -n 2 -f mpd.hosts --ncpus=1
> unable to obtain IP for host: mlscub1 at mlscub1-desktop.local
> unable to obtain IP for host: mlscub2 at mlscub2-desktop.local
> totalnum=2  numhosts=1
> there are not enough hosts on which to start all processes

The /etc/hosts file on each machine needs to have information on all the
nodes in the system.

> So, can you please tell me what might have gone wrong? I do not think
it is
> a problem of my /etc/hosts setting.
> mlscub1 at mlscub1-desktop:~$ cat /etc/hosts
> 127.0.0.1       localhost
> 172.16.17.24    mlscub1-desktop

The file doesn't contain information on mlscub2-desktop

You can use the mpdcheck utility to look for such errors in your setup.

> I tried it in other way round with specifying hostname of mpd master in
> slave system. That works fine.
>  
> mlscub1 at mlscub1-desktop:~$ mpd &
> [1] 22750
> 
> mlscub1 at mlscub1-desktop:~$ mpdtrace -l
> mlscub1-desktop_50100 (172.16.17.24)
> 
> mlscub2 at mlscub2-desktop:~$ mpd -h mlscub1-desktop.local -p 50100 &
> [1] 11418
> 
> mlscub2 at mlscub2-desktop:~$ mpdtrace -l
> mlscub2-desktop_50514 (172.16.17.93)
> mlscub1-desktop_50100 (172.16.17.24)
> 
> mlscub1 at mlscub1-desktop:~$ mpdtrace -l
> mlscub1-desktop_50100 (172.16.17.24)
> mlscub2-desktop_50514 (172.16.17.93)

Good to know that this works. This is another option, but is more
cumbersome to use, so we don't usually prefer it.

> I have another problem with executing programs. I find something like
> permission denied error messages.
> 
> mlscub1 at mlscub1-desktop:~$ mpiexec -n 5
> /home/mlscub1/libraries/mpich2-1.2.1/examples/cpi.c
> problem with execution of
> /home/mlscub1/libraries/mpich2-1.2.1/examples/cpi.c  on  mlscub1-desktop:
> [Errno 13] Permission denied
> problem with execution of
> /home/mlscub1/libraries/mpich2-1.2.1/examples/cpi.c  on  mlscub2-desktop:
> [Errno 2] No such file or directory
> problem with execution of
> /home/mlscub1/libraries/mpich2-1.2.1/examples/cpi.c  on  mlscub2-desktop:
> [Errno 2] No such file or directory
> problem with execution of
> /home/mlscub1/libraries/mpich2-1.2.1/examples/cpi.c  on  mlscub1-desktop:
> [Errno 13] Permission denied
> problem with execution of
> /home/mlscub1/libraries/mpich2-1.2.1/examples/cpi.c  on  mlscub1-desktop:
> [Errno 13] Permission denied

Why are you trying to execute cpi.c ? Do you mean to compile it and
execute cpi?

% mpicc examples/cpi.c -o examples/cpi

% mpiexec -n 5 ./examples/cpi

 -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


######################################################################
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.

######################################################################

Please do not print this email unless it is absolutely necessary. 
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 
WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 
www.manipal.edu


More information about the mpich-discuss mailing list