[mpich-discuss] mpiexec woes

Janzen Brewer janzen.brewer at gtri.gatech.edu
Mon Aug 31 13:39:06 CDT 2009


I see what you mean. I think the problem with my setup might be related 
to the fact that the master node is on both the private network (for 
compute nodes only) and the public network and has different hostnames 
for each. Publicly its hostname is 'tesla' and privately it's 'node0'. 
I've poked around with settings, but can't seem to figure out how to 
rectify this. I have been able to run basic mpiexec commands 
successfully when I use a compute node (specifically NOT tesla) as the 
master and any other compute nodes (again, NOT tesla) as slaves. Ideas?

Thanks!
Janzen

In addition, here is the output of similar commands in python:

[root at tesla ~]# python
Python 2.4.3 (#1, Sep 17 2008, 16:07:08)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-41)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import socket
 >>> socket.gethostname()
'tesla.stl.gtri.gatech.edu'
 >>> socket.gethostbyname_ex('tesla')
('tesla.stl.gtri.gatech.edu', [], ['130.207.197.196'])
 >>> socket.gethostbyname_ex('node0')
('node0', [], ['192.168.1.100'])
 >>> socket.gethostbyname_ex('node1')
('node1', [], ['192.168.1.1'])

And:

[root at node1 ~]# python
Python 2.4.3 (#1, Sep 17 2008, 16:07:08)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-41)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import socket
 >>> socket.gethostname()
'node1'
 >>> socket.gethostbyname_ex('tesla')
('tesla', ['node0', 'yum'], ['192.168.1.100'])
 >>> socket.gethostbyname_ex('node0')
('tesla', ['node0', 'yum'], ['192.168.1.100'])
 >>> socket.gethostbyname_ex('node1')
('node1', [], ['192.168.1.1'])


Ralph Butler wrote:
> One mpd is failing to obtain correct info about its own host or the  
> other one.
> I might guess that the hostname you provide on the cmd line when  
> running mpdcheck and the hostname
> that the system identifies itself by are different.  I would have  
> thought running mpdcheck like this:
>          mpdcheck -v
> might shed some light on that however.
>
> Some of the kinds of things mpdcheck does, you can try by hand.  For  
> example, I have 2 hosts named
> b01 and b02.  I can do some quick, non-exhaustive verification that  
> they correctly identify each other:
>
> First, on b01:
> (b01:51)% python
> Python 2.5.2 (r252:60911, Jan  4 2009, 17:40:26)
> [GCC 4.3.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import socket
>  >>> socket.gethostname()
> 'b01'
>  >>> socket.gethostbyname_ex('b01')
> ('b01.cs.mtsu.edu', ['b01'], ['161.45.166.1'])
>  >>> socket.gethostbyname_ex('b02')
> ('b02.cs.mtsu.edu', [], ['161.45.166.2'])
>
> Then, on b02:
> (b02:51)% python
> Python 2.5.2 (r252:60911, Jan  4 2009, 17:40:26)
> [GCC 4.3.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import socket
>  >>> socket.gethostname()
> 'b02'
>  >>> socket.gethostbyname_ex('b01')
> ('b01.cs.mtsu.edu', [], ['161.45.166.1'])
>  >>> socket.gethostbyname_ex('b02')
> ('b02.cs.mtsu.edu', ['b02'], ['161.45.166.2'])
>   


More information about the mpich-discuss mailing list