[mpich-discuss] mpiexec woes
Ralph Butler
rbutler at mtsu.edu
Mon Aug 31 15:00:23 CDT 2009
Your node0 is really multi-homed, i.e. has 2 or more network interfaces.
Typically, this implies that you may have to specify the --ifhn option
when
starting up an mpd.
There is a section of the manual devoted to multi-homed setups and
related
issues.
On MonAug 31, at Mon Aug 31 1:39PM, Janzen Brewer wrote:
> I see what you mean. I think the problem with my setup might be
> related to the fact that the master node is on both the private
> network (for compute nodes only) and the public network and has
> different hostnames for each. Publicly its hostname is 'tesla' and
> privately it's 'node0'. I've poked around with settings, but can't
> seem to figure out how to rectify this. I have been able to run
> basic mpiexec commands successfully when I use a compute node
> (specifically NOT tesla) as the master and any other compute nodes
> (again, NOT tesla) as slaves. Ideas?
>
> Thanks!
> Janzen
>
> In addition, here is the output of similar commands in python:
>
> [root at tesla ~]# python
> Python 2.4.3 (#1, Sep 17 2008, 16:07:08)
> [GCC 4.1.2 20071124 (Red Hat 4.1.2-41)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import socket
> >>> socket.gethostname()
> 'tesla.stl.gtri.gatech.edu'
> >>> socket.gethostbyname_ex('tesla')
> ('tesla.stl.gtri.gatech.edu', [], ['130.207.197.196'])
> >>> socket.gethostbyname_ex('node0')
> ('node0', [], ['192.168.1.100'])
> >>> socket.gethostbyname_ex('node1')
> ('node1', [], ['192.168.1.1'])
>
> And:
>
> [root at node1 ~]# python
> Python 2.4.3 (#1, Sep 17 2008, 16:07:08)
> [GCC 4.1.2 20071124 (Red Hat 4.1.2-41)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import socket
> >>> socket.gethostname()
> 'node1'
> >>> socket.gethostbyname_ex('tesla')
> ('tesla', ['node0', 'yum'], ['192.168.1.100'])
> >>> socket.gethostbyname_ex('node0')
> ('tesla', ['node0', 'yum'], ['192.168.1.100'])
> >>> socket.gethostbyname_ex('node1')
> ('node1', [], ['192.168.1.1'])
>
>
> Ralph Butler wrote:
>> One mpd is failing to obtain correct info about its own host or
>> the other one.
>> I might guess that the hostname you provide on the cmd line when
>> running mpdcheck and the hostname
>> that the system identifies itself by are different. I would have
>> thought running mpdcheck like this:
>> mpdcheck -v
>> might shed some light on that however.
>>
>> Some of the kinds of things mpdcheck does, you can try by hand.
>> For example, I have 2 hosts named
>> b01 and b02. I can do some quick, non-exhaustive verification
>> that they correctly identify each other:
>>
>> First, on b01:
>> (b01:51)% python
>> Python 2.5.2 (r252:60911, Jan 4 2009, 17:40:26)
>> [GCC 4.3.2] on linux2
>> Type "help", "copyright", "credits" or "license" for more
>> information.
>> >>> import socket
>> >>> socket.gethostname()
>> 'b01'
>> >>> socket.gethostbyname_ex('b01')
>> ('b01.cs.mtsu.edu', ['b01'], ['161.45.166.1'])
>> >>> socket.gethostbyname_ex('b02')
>> ('b02.cs.mtsu.edu', [], ['161.45.166.2'])
>>
>> Then, on b02:
>> (b02:51)% python
>> Python 2.5.2 (r252:60911, Jan 4 2009, 17:40:26)
>> [GCC 4.3.2] on linux2
>> Type "help", "copyright", "credits" or "license" for more
>> information.
>> >>> import socket
>> >>> socket.gethostname()
>> 'b02'
>> >>> socket.gethostbyname_ex('b01')
>> ('b01.cs.mtsu.edu', [], ['161.45.166.1'])
>> >>> socket.gethostbyname_ex('b02')
>> ('b02.cs.mtsu.edu', ['b02'], ['161.45.166.2'])
>>
More information about the mpich-discuss
mailing list