Just a small follow up.<div><br></div><div>I am using PBS (Torque). It seems to me that a possible reason is it is always using the PBS nodes which were requested even if I tell it to use these other eth2 IP's. Or unset the HYDRA_RMK environment variable.</div>
<div><br></div><div>Looking at it a bit more, the -rmk flag is defaulted to "pbs" so it still maybe using pbs? In the help (mpiexec -h) it says "-rmk dummy" is a valid option, but when I use this, it errors out saying not a valid option. I can't seem to change -rmk to anything other than pbs. Could this parameter be taking default over all the others (-hosts, -f, -iface), and then always using the node IP's PBS gave me (the 10GigE network in this case)?</div>
<div><br></div><div>~cody</div><div><br><div class="gmail_quote">On Tue, Aug 31, 2010 at 2:11 PM, Cody R. Brown <span dir="ltr"><<a href="mailto:cody@cs.ubc.ca">cody@cs.ubc.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hello;
<div><br></div><div>I have a system with 2 Ethernets (eth0 and eth2). eth0 is connected to a 10GigE switch, and eth2 is connected to a separate GigE switch.</div><div><br></div><div>Using HYDRA in version 1.2.1p1, when I want to use a different interface, I get my desired results. The file "hostsGigE" has 2 host names with the gige interface IP's, well "hosts10GigE" has 2 10gige IP's. The following commands work:</div>
<div> # mpiexec -f hostsGigE -n 2 `pwd`/osu_bw ---Shows bandwidth around 117MB/s</div><div> # mpiexec -f hosts10GigE -n 2 `pwd`/osu_bw ---Shows bandwidth around 900MB/s</div><div><br></div><div>When using the latest MPICH2 (1.3a2 and 1.3b1), it seems to always be using the 10GigE network</div>
<div> # mpiexec -f hostsGigE -n 2 `pwd`/osu_bw --Shows bandwidth around 900MB/s</div><div> # mpiexec -f hosts10GigE -n 2 `pwd`/osu_bw --Shows bandwidth around 900MB/s</div>
<div><br></div><div>These commands also show bandwidth around 900MB/s (including using the IP instead of hostnames) (IE using -iface -hosts and -f flags):</div><div>
# mpiexec -f hosts10GigE -n 2 -iface eth2 `pwd`/osu_bw </div><div> # mpiexec -hosts node01-eth2,node02-eth2 -iface eth2 -n 2 `pwd`/osu_bw</div><div> # mpiexec -hosts 172.20.101.1,172.20.101.2 -n 2 `pwd`/osu_bw</div>
<div><br></div><div><br></div><div>Anyone know what I am doing wrong? And why it works as expected in the HYDRA 1.2.1p1 version, but not in the latest 1.3b1? I am a little confused on how it even knows about the 10GigE network when I only gave it GigE hostnames? Perhaps my system is sending it out on the 10GigE network, but then why does it work fine in 1.2.1p1?</div>
<div><br></div><div>The system I am running on is Linux: CentOS 5.5. It is a cluster running with PBS (Torque). I do have HYDRA_RMK set to "pbs", but I also tried it with this environment variable unset. It seems the command line parameters take default. The info here "<a href="http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks" target="_blank">http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks</a>" shows what I am doing should work. My "ifconfig" is below.</div>
<div><br></div><div>Any help would be appreciated.</div><div><br></div><div>~cody</div><div><br></div><div><br></div><div><div>eth0 Link encap:Ethernet HWaddr 00:1B:21:69:79:A0 </div>
<div> inet addr:192.168.20.1 Bcast:192.168.20.255 Mask:255.255.255.0</div><div> inet6 addr: fe80::21b:21ff:fe69:79a0/64 Scope:Link</div><div> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</div>
<div> RX packets:7454839 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:149930410 errors:0 dropped:0 overruns:0 carrier:0</div><div> collisions:0 txqueuelen:1000 </div><div> RX bytes:45436437528 (42.3 GiB) TX bytes:221935890089 (206.6 GiB)</div>
</div><div><div>eth2 Link encap:Ethernet HWaddr E4:1F:13:4D:13:0E </div><div> inet addr:172.20.101.1 Bcast:172.20.101.255 Mask:255.255.255.0</div><div> inet6 addr: fe80::e61f:13ff:fe4d:130e/64 Scope:Link</div>
<div> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</div><div> RX packets:556581 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:8745499 errors:0 dropped:0 overruns:0 carrier:0</div>
<div> collisions:0 txqueuelen:1000 </div><div> RX bytes:39219489 (37.4 MiB) TX bytes:12766433186 (11.8 GiB)</div><div> Memory:92b60000-92b80000</div></div>
</blockquote></div><br></div>