[mpich2-dev] HYDRA: Using Multiple Ethernet Interfaces
Cody R. Brown
cody at cs.ubc.ca
Tue Aug 31 17:15:46 CDT 2010
Just a small follow up.
I am using PBS (Torque). It seems to me that a possible reason is it is
always using the PBS nodes which were requested even if I tell it to use
these other eth2 IP's. Or unset the HYDRA_RMK environment variable.
Looking at it a bit more, the -rmk flag is defaulted to "pbs" so it still
maybe using pbs? In the help (mpiexec -h) it says "-rmk dummy" is a valid
option, but when I use this, it errors out saying not a valid option. I
can't seem to change -rmk to anything other than pbs. Could this parameter
be taking default over all the others (-hosts, -f, -iface), and then always
using the node IP's PBS gave me (the 10GigE network in this case)?
~cody
On Tue, Aug 31, 2010 at 2:11 PM, Cody R. Brown <cody at cs.ubc.ca> wrote:
> Hello;
>
> I have a system with 2 Ethernets (eth0 and eth2). eth0 is connected to a
> 10GigE switch, and eth2 is connected to a separate GigE switch.
>
> Using HYDRA in version 1.2.1p1, when I want to use a different interface, I
> get my desired results. The file "hostsGigE" has 2 host names with the gige
> interface IP's, well "hosts10GigE" has 2 10gige IP's. The following
> commands work:
> # mpiexec -f hostsGigE -n 2 `pwd`/osu_bw ---Shows bandwidth
> around 117MB/s
> # mpiexec -f hosts10GigE -n 2 `pwd`/osu_bw ---Shows bandwidth
> around 900MB/s
>
> When using the latest MPICH2 (1.3a2 and 1.3b1), it seems to always be using
> the 10GigE network
> # mpiexec -f hostsGigE -n 2 `pwd`/osu_bw --Shows bandwidth around
> 900MB/s
> # mpiexec -f hosts10GigE -n 2 `pwd`/osu_bw --Shows bandwidth around
> 900MB/s
>
> These commands also show bandwidth around 900MB/s (including using the IP
> instead of hostnames) (IE using -iface -hosts and -f flags):
> # mpiexec -f hosts10GigE -n 2 -iface eth2 `pwd`/osu_bw
> # mpiexec -hosts node01-eth2,node02-eth2 -iface eth2 -n 2 `pwd`/osu_bw
> # mpiexec -hosts 172.20.101.1,172.20.101.2 -n 2 `pwd`/osu_bw
>
>
> Anyone know what I am doing wrong? And why it works as expected in the
> HYDRA 1.2.1p1 version, but not in the latest 1.3b1? I am a little confused
> on how it even knows about the 10GigE network when I only gave it GigE
> hostnames? Perhaps my system is sending it out on the 10GigE network, but
> then why does it work fine in 1.2.1p1?
>
> The system I am running on is Linux: CentOS 5.5. It is a cluster running
> with PBS (Torque). I do have HYDRA_RMK set to "pbs", but I also tried it
> with this environment variable unset. It seems the command line parameters
> take default. The info here "
> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks"
> shows what I am doing should work. My "ifconfig" is below.
>
> Any help would be appreciated.
>
> ~cody
>
>
> eth0 Link encap:Ethernet HWaddr 00:1B:21:69:79:A0
> inet addr:192.168.20.1 Bcast:192.168.20.255 Mask:255.255.255.0
> inet6 addr: fe80::21b:21ff:fe69:79a0/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:7454839 errors:0 dropped:0 overruns:0 frame:0
> TX packets:149930410 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:45436437528 (42.3 GiB) TX bytes:221935890089 (206.6
> GiB)
> eth2 Link encap:Ethernet HWaddr E4:1F:13:4D:13:0E
> inet addr:172.20.101.1 Bcast:172.20.101.255 Mask:255.255.255.0
> inet6 addr: fe80::e61f:13ff:fe4d:130e/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:556581 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8745499 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:39219489 (37.4 MiB) TX bytes:12766433186 (11.8 GiB)
> Memory:92b60000-92b80000
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20100831/7ad387c2/attachment.htm>
More information about the mpich2-dev
mailing list