[mpich2-dev] HYDRA: Using Multiple Ethernet Interfaces

Cody R. Brown cody at cs.ubc.ca
Tue Aug 31 16:11:55 CDT 2010


Hello;

I have a system with 2 Ethernets (eth0 and eth2). eth0 is connected to a
10GigE switch, and eth2 is connected to a separate GigE switch.

Using HYDRA in version 1.2.1p1, when I want to use a different interface, I
get my desired results. The file "hostsGigE" has 2 host names with the gige
interface IP's, well "hosts10GigE" has 2 10gige IP's.  The following
commands work:
     # mpiexec -f hostsGigE -n 2 `pwd`/osu_bw         ---Shows bandwidth
around 117MB/s
    # mpiexec -f hosts10GigE -n 2 `pwd`/osu_bw      ---Shows bandwidth
around 900MB/s

When using the latest MPICH2 (1.3a2 and 1.3b1), it seems to always be using
the 10GigE network
    # mpiexec -f hostsGigE -n 2 `pwd`/osu_bw     --Shows bandwidth around
900MB/s
    # mpiexec -f hosts10GigE -n 2 `pwd`/osu_bw   --Shows bandwidth around
900MB/s

These commands also show bandwidth around 900MB/s (including using the IP
instead of hostnames) (IE using -iface -hosts and -f flags):
    # mpiexec -f hosts10GigE -n 2 -iface eth2 `pwd`/osu_bw
    # mpiexec -hosts node01-eth2,node02-eth2 -iface eth2 -n 2 `pwd`/osu_bw
    # mpiexec -hosts 172.20.101.1,172.20.101.2 -n 2 `pwd`/osu_bw


Anyone know what I am doing wrong?  And why it works as expected in the
HYDRA 1.2.1p1 version, but not in the latest 1.3b1?  I am a little confused
on how it even knows about the 10GigE network when I only gave it GigE
hostnames? Perhaps my system is sending it out on the 10GigE network, but
then why does it work fine in 1.2.1p1?

The system I am running on is Linux: CentOS 5.5.  It is a cluster running
with PBS (Torque).  I do have HYDRA_RMK set to "pbs", but I also tried it
with this environment variable unset.  It seems the command line parameters
take default.  The info here "
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks"
shows what I am doing should work.  My "ifconfig" is below.

Any help would be appreciated.

~cody


eth0      Link encap:Ethernet  HWaddr 00:1B:21:69:79:A0
          inet addr:192.168.20.1  Bcast:192.168.20.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:21ff:fe69:79a0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7454839 errors:0 dropped:0 overruns:0 frame:0
          TX packets:149930410 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:45436437528 (42.3 GiB)  TX bytes:221935890089 (206.6 GiB)
eth2      Link encap:Ethernet  HWaddr E4:1F:13:4D:13:0E
          inet addr:172.20.101.1  Bcast:172.20.101.255  Mask:255.255.255.0
          inet6 addr: fe80::e61f:13ff:fe4d:130e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:556581 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8745499 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:39219489 (37.4 MiB)  TX bytes:12766433186 (11.8 GiB)
          Memory:92b60000-92b80000
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20100831/c4fb1c14/attachment.htm>


More information about the mpich2-dev mailing list