[mpich2-dev] HYDRA: Using Multiple Ethernet Interfaces
Cody R. Brown
cody at cs.ubc.ca
Tue Aug 31 16:11:55 CDT 2010
Hello;
I have a system with 2 Ethernets (eth0 and eth2). eth0 is connected to a
10GigE switch, and eth2 is connected to a separate GigE switch.
Using HYDRA in version 1.2.1p1, when I want to use a different interface, I
get my desired results. The file "hostsGigE" has 2 host names with the gige
interface IP's, well "hosts10GigE" has 2 10gige IP's. The following
commands work:
# mpiexec -f hostsGigE -n 2 `pwd`/osu_bw ---Shows bandwidth
around 117MB/s
# mpiexec -f hosts10GigE -n 2 `pwd`/osu_bw ---Shows bandwidth
around 900MB/s
When using the latest MPICH2 (1.3a2 and 1.3b1), it seems to always be using
the 10GigE network
# mpiexec -f hostsGigE -n 2 `pwd`/osu_bw --Shows bandwidth around
900MB/s
# mpiexec -f hosts10GigE -n 2 `pwd`/osu_bw --Shows bandwidth around
900MB/s
These commands also show bandwidth around 900MB/s (including using the IP
instead of hostnames) (IE using -iface -hosts and -f flags):
# mpiexec -f hosts10GigE -n 2 -iface eth2 `pwd`/osu_bw
# mpiexec -hosts node01-eth2,node02-eth2 -iface eth2 -n 2 `pwd`/osu_bw
# mpiexec -hosts 172.20.101.1,172.20.101.2 -n 2 `pwd`/osu_bw
Anyone know what I am doing wrong? And why it works as expected in the
HYDRA 1.2.1p1 version, but not in the latest 1.3b1? I am a little confused
on how it even knows about the 10GigE network when I only gave it GigE
hostnames? Perhaps my system is sending it out on the 10GigE network, but
then why does it work fine in 1.2.1p1?
The system I am running on is Linux: CentOS 5.5. It is a cluster running
with PBS (Torque). I do have HYDRA_RMK set to "pbs", but I also tried it
with this environment variable unset. It seems the command line parameters
take default. The info here "
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks"
shows what I am doing should work. My "ifconfig" is below.
Any help would be appreciated.
~cody
eth0 Link encap:Ethernet HWaddr 00:1B:21:69:79:A0
inet addr:192.168.20.1 Bcast:192.168.20.255 Mask:255.255.255.0
inet6 addr: fe80::21b:21ff:fe69:79a0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7454839 errors:0 dropped:0 overruns:0 frame:0
TX packets:149930410 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:45436437528 (42.3 GiB) TX bytes:221935890089 (206.6 GiB)
eth2 Link encap:Ethernet HWaddr E4:1F:13:4D:13:0E
inet addr:172.20.101.1 Bcast:172.20.101.255 Mask:255.255.255.0
inet6 addr: fe80::e61f:13ff:fe4d:130e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:556581 errors:0 dropped:0 overruns:0 frame:0
TX packets:8745499 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:39219489 (37.4 MiB) TX bytes:12766433186 (11.8 GiB)
Memory:92b60000-92b80000
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20100831/c4fb1c14/attachment.htm>
More information about the mpich2-dev
mailing list