<span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">Gus,</font></span><div><br></div><div>Information sharing is truly the point of the mailing list. Useful messages should ask questions or provide answers! :)</div>

<div><br></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">Someone mentioned STREAM benchmarks (memory BW benchmarks) a little while back. I did these when our new system came in a while ago, so I dug them back out.</font></span><div>

<span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br></font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">This (STREAM) can be compiled to use MPI, but it is only a synchronization tool, the benchmark is still a memory bus test (each task is trying to run through memory, but this is not an MPI communication test.)</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br></font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">My results on a dual&nbsp;E5472 machine (Two Quad-core 3Ghz packages; 1600MHz bus; 8 total cores)</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br></font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">Results (each set are [1..8] processes in order), double-precision array size =&nbsp;20,000,000, run through 10 times.</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif"><br></font></span></div><div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">Function &nbsp; &nbsp; Rate (MB/s) &nbsp;Avg time &nbsp; Min time &nbsp;Max time</font></span></div>

<div><div>Copy: &nbsp; &nbsp; &nbsp; 2962.6937 &nbsp; &nbsp; &nbsp;0.1081 &nbsp; &nbsp; &nbsp;0.1080 &nbsp; &nbsp; &nbsp;0.1081</div><div>Copy: &nbsp; &nbsp; &nbsp; 5685.3008 &nbsp; &nbsp; &nbsp;0.1126 &nbsp; &nbsp; &nbsp;0.1126 &nbsp; &nbsp; &nbsp;0.1128</div><div>Copy: &nbsp; &nbsp; &nbsp; 5484.6846 &nbsp; &nbsp; &nbsp;0.1751 &nbsp; &nbsp; &nbsp;0.1750 &nbsp; &nbsp; &nbsp;0.1751</div><div>Copy: &nbsp; &nbsp; &nbsp; 7085.7959 &nbsp; &nbsp; &nbsp;0.1809 &nbsp; &nbsp; &nbsp;0.1806 &nbsp; &nbsp; &nbsp;0.1817</div>

<div>Copy: &nbsp; &nbsp; &nbsp; 5981.6033 &nbsp; &nbsp; &nbsp;0.2676 &nbsp; &nbsp; &nbsp;0.2675 &nbsp; &nbsp; &nbsp;0.2676</div><div>Copy: &nbsp; &nbsp; &nbsp; 7071.2490 &nbsp; &nbsp; &nbsp;0.2718 &nbsp; &nbsp; &nbsp;0.2715 &nbsp; &nbsp; &nbsp;0.2722</div><div>Copy: &nbsp; &nbsp; &nbsp; 6537.4934 &nbsp; &nbsp; &nbsp;0.3427 &nbsp; &nbsp; &nbsp;0.3426 &nbsp; &nbsp; &nbsp;0.3428</div><div>Copy: &nbsp; &nbsp; &nbsp; 7423.4545 &nbsp; &nbsp; &nbsp;0.3451 &nbsp; &nbsp; &nbsp;0.3449 &nbsp; &nbsp; &nbsp;0.3455</div>

<div><br></div></div></div><div><div><div>Scale: &nbsp; &nbsp; &nbsp;3011.8445 &nbsp; &nbsp; &nbsp;0.1063 &nbsp; &nbsp; &nbsp;0.1062 &nbsp; &nbsp; &nbsp;0.1063</div><div>Scale: &nbsp; &nbsp; &nbsp;5675.8162 &nbsp; &nbsp; &nbsp;0.1128 &nbsp; &nbsp; &nbsp;0.1128 &nbsp; &nbsp; &nbsp;0.1129</div><div>Scale: &nbsp; &nbsp; &nbsp;5474.8854 &nbsp; &nbsp; &nbsp;0.1754 &nbsp; &nbsp; &nbsp;0.1753 &nbsp; &nbsp; &nbsp;0.1754</div>

<div>Scale: &nbsp; &nbsp; &nbsp;7068.6204 &nbsp; &nbsp; &nbsp;0.1814 &nbsp; &nbsp; &nbsp;0.1811 &nbsp; &nbsp; &nbsp;0.1819</div><div>Scale: &nbsp; &nbsp; &nbsp;5974.6112 &nbsp; &nbsp; &nbsp;0.2679 &nbsp; &nbsp; &nbsp;0.2678 &nbsp; &nbsp; &nbsp;0.2680</div><div>Scale: &nbsp; &nbsp; &nbsp;7063.8307 &nbsp; &nbsp; &nbsp;0.2721 &nbsp; &nbsp; &nbsp;0.2718 &nbsp; &nbsp; &nbsp;0.2725</div><div>Scale: &nbsp; &nbsp; &nbsp;6533.4473 &nbsp; &nbsp; &nbsp;0.3430 &nbsp; &nbsp; &nbsp;0.3429 &nbsp; &nbsp; &nbsp;0.3431</div>

<div>Scale: &nbsp; &nbsp; &nbsp;7418.6128 &nbsp; &nbsp; &nbsp;0.3453 &nbsp; &nbsp; &nbsp;0.3451 &nbsp; &nbsp; &nbsp;0.3456</div></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif"><br></font></span></div><div>

<div>Add: &nbsp; &nbsp; &nbsp; &nbsp;3184.3129 &nbsp; &nbsp; &nbsp;0.1508 &nbsp; &nbsp; &nbsp;0.1507 &nbsp; &nbsp; &nbsp;0.1508</div><div>Add: &nbsp; &nbsp; &nbsp; &nbsp;5892.1781 &nbsp; &nbsp; &nbsp;0.1631 &nbsp; &nbsp; &nbsp;0.1629 &nbsp; &nbsp; &nbsp;0.1633</div><div>Add: &nbsp; &nbsp; &nbsp; &nbsp;5588.0229 &nbsp; &nbsp; &nbsp;0.2577 &nbsp; &nbsp; &nbsp;0.2577 &nbsp; &nbsp; &nbsp;0.2578</div><div>Add: &nbsp; &nbsp; &nbsp; &nbsp;7275.0745 &nbsp; &nbsp; &nbsp;0.2642 &nbsp; &nbsp; &nbsp;0.2639 &nbsp; &nbsp; &nbsp;0.2646</div>

<div>Add: &nbsp; &nbsp; &nbsp; &nbsp;6175.7646 &nbsp; &nbsp; &nbsp;0.3887 &nbsp; &nbsp; &nbsp;0.3886 &nbsp; &nbsp; &nbsp;0.3889</div><div>Add: &nbsp; &nbsp; &nbsp; &nbsp;7262.7112 &nbsp; &nbsp; &nbsp;0.3970 &nbsp; &nbsp; &nbsp;0.3965 &nbsp; &nbsp; &nbsp;0.3976</div><div>Add: &nbsp; &nbsp; &nbsp; &nbsp;6687.7658 &nbsp; &nbsp; &nbsp;0.5025 &nbsp; &nbsp; &nbsp;0.5024 &nbsp; &nbsp; &nbsp;0.5026</div><div>Add: &nbsp; &nbsp; &nbsp; &nbsp;7599.2516 &nbsp; &nbsp; &nbsp;0.5057 &nbsp; &nbsp; &nbsp;0.5053 &nbsp; &nbsp; &nbsp;0.5062</div>

</div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif"><br></font></span></div><div><div><div>Triad: &nbsp; &nbsp; &nbsp;3224.7856 &nbsp; &nbsp; &nbsp;0.1489 &nbsp; &nbsp; &nbsp;0.1488 &nbsp; &nbsp; &nbsp;0.1489</div>

<div>Triad: &nbsp; &nbsp; &nbsp;6021.2613 &nbsp; &nbsp; &nbsp;0.1596 &nbsp; &nbsp; &nbsp;0.1594 &nbsp; &nbsp; &nbsp;0.1598</div><div>Triad: &nbsp; &nbsp; &nbsp;5609.9260 &nbsp; &nbsp; &nbsp;0.2567 &nbsp; &nbsp; &nbsp;0.2567 &nbsp; &nbsp; &nbsp;0.2568</div><div>Triad: &nbsp; &nbsp; &nbsp;7293.2790 &nbsp; &nbsp; &nbsp;0.2637 &nbsp; &nbsp; &nbsp;0.2633 &nbsp; &nbsp; &nbsp;0.2641</div><div>Triad: &nbsp; &nbsp; &nbsp;6185.4376 &nbsp; &nbsp; &nbsp;0.3881 &nbsp; &nbsp; &nbsp;0.3880 &nbsp; &nbsp; &nbsp;0.3881</div>

<div>Triad: &nbsp; &nbsp; &nbsp;7279.1231 &nbsp; &nbsp; &nbsp;0.3958 &nbsp; &nbsp; &nbsp;0.3957 &nbsp; &nbsp; &nbsp;0.3961</div><div>Triad: &nbsp; &nbsp; &nbsp;6691.8560 &nbsp; &nbsp; &nbsp;0.5022 &nbsp; &nbsp; &nbsp;0.5021 &nbsp; &nbsp; &nbsp;0.5022</div><div>Triad: &nbsp; &nbsp; &nbsp;7604.1238 &nbsp; &nbsp; &nbsp;0.5052 &nbsp; &nbsp; &nbsp;0.5050 &nbsp; &nbsp; &nbsp;0.5057</div></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif"><br>

</font></span></div><div>These work out to (~):</div><div>1x</div><div>1.9x</div><div>1.8x</div><div>2.3x</div><div>1.9x</div><div>2.2x</div><div>2.1x</div><div>2.4x</div><div>&nbsp;</div><div>for [1..8] cores.</div><div><br></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">As you can see, it doesn&#39;t take eight cores to saturate the bus, even with a 1600MHz bus. Four of the eight cores running does this trick.</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br></font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">With all that said, there are still advantages to be had with the multicore chipsets, but only if you&#39;re not blowing full tilt through memory. If it can fit the problem, do more inside a loop rather than running multiple loops over the same memory.&nbsp;</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br></font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">For reference, here&#39;s what using the&nbsp;osu_mbw_mr test (from MVAPICH2 1.0.2; I also have a cluster running nearby :) compiled on MPICH2 (1.0.7rc1 with nemesis provides this performance from one/two/four pairs (2/4/8 processes) of producer/consumers:</font></span></div>

<div><br></div></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><div># OSU MPI Multi BW / Message Rate Test (Version 1.0)</div><div># [ pairs: 1 ] [ window size: 64 ]</div>

<div><br></div><div># &nbsp;Size &nbsp; &nbsp;MB/sec &nbsp; &nbsp;Messages/sec</div><div>&nbsp;&nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp;1.08 &nbsp; 1076540.83</div><div>&nbsp;&nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; &nbsp;2.14 &nbsp; 1068102.24</div><div>&nbsp;&nbsp; &nbsp; &nbsp;4 &nbsp; &nbsp; &nbsp;3.99 &nbsp; &nbsp;997382.24</div><div>&nbsp;&nbsp; &nbsp; &nbsp;8 &nbsp; &nbsp; &nbsp;7.97 &nbsp; &nbsp;996419.66</div>

<div>&nbsp;&nbsp; &nbsp; 16 &nbsp; &nbsp; 15.95 &nbsp; &nbsp;996567.63</div><div>&nbsp;&nbsp; &nbsp; 32 &nbsp; &nbsp; 31.67 &nbsp; &nbsp;989660.29</div><div>&nbsp;&nbsp; &nbsp; 64 &nbsp; &nbsp; 62.73 &nbsp; &nbsp;980084.91</div><div>&nbsp;&nbsp; &nbsp;128 &nbsp; &nbsp;124.12 &nbsp; &nbsp;969676.18</div><div>&nbsp;&nbsp; &nbsp;256 &nbsp; &nbsp;243.59 &nbsp; &nbsp;951527.62</div><div>&nbsp;&nbsp; &nbsp;512 &nbsp; &nbsp;445.52 &nbsp; &nbsp;870159.34</div>

<div>&nbsp;&nbsp; 1024 &nbsp; &nbsp;810.28 &nbsp; &nbsp;791284.80</div><div>&nbsp;&nbsp; 2048 &nbsp; 1357.25 &nbsp; &nbsp;662721.78</div><div>&nbsp;&nbsp; 4096 &nbsp; 1935.08 &nbsp; &nbsp;472431.28</div><div>&nbsp;&nbsp; 8192 &nbsp; 2454.29 &nbsp; &nbsp;299596.49</div><div>&nbsp;&nbsp;16384 &nbsp; 2717.61 &nbsp; &nbsp;165869.84</div><div>&nbsp;&nbsp;32768 &nbsp; 2900.23 &nbsp; &nbsp; 88507.85</div>

<div>&nbsp;&nbsp;65536 &nbsp; 2279.71 &nbsp; &nbsp; 34785.63</div><div>&nbsp;131072 &nbsp; 2540.51 &nbsp; &nbsp; 19382.53</div><div>&nbsp;262144 &nbsp; 1335.16 &nbsp; &nbsp; &nbsp;5093.21</div><div>&nbsp;524288 &nbsp; 1364.05 &nbsp; &nbsp; &nbsp;2601.72</div><div>1048576 &nbsp; 1378.39 &nbsp; &nbsp; &nbsp;1314.53</div><div>2097152 &nbsp; 1380.78 &nbsp; &nbsp; &nbsp; 658.41</div>

<div>4194304 &nbsp; 1343.48 &nbsp; &nbsp; &nbsp; 320.31</div><div><br></div></font></span></div><div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"># OSU MPI Multi BW / Message Rate Test (Version 1.0)</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"># [ pairs: 2 ] [ window size: 64 ]</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br>

</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"># &nbsp;Size &nbsp; &nbsp;MB/sec &nbsp; &nbsp;Messages/sec</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp;2.15 &nbsp; 2150580.48</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; &nbsp;4.22 &nbsp; 2109761.12</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; &nbsp;4 &nbsp; &nbsp; &nbsp;7.84 &nbsp; 1960742.53</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; &nbsp;8 &nbsp; &nbsp; 15.80 &nbsp; 1974733.92</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; 16 &nbsp; &nbsp; 31.38 &nbsp; 1961100.64</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; 32 &nbsp; &nbsp; 62.32 &nbsp; 1947654.32</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; 64 &nbsp; &nbsp;123.39 &nbsp; 1928000.11</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp;128 &nbsp; &nbsp;243.19 &nbsp; 1899957.22</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp;256 &nbsp; &nbsp;475.32 &nbsp; 1856721.12</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp;512 &nbsp; &nbsp;856.90 &nbsp; 1673642.10</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; 1024 &nbsp; 1513.19 &nbsp; 1477721.26</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; 2048 &nbsp; 2312.91 &nbsp; 1129351.07</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; 4096 &nbsp; 2891.21 &nbsp; &nbsp;705861.12</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; 8192 &nbsp; 3267.49 &nbsp; &nbsp;398863.98</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp;16384 &nbsp; 3400.64 &nbsp; &nbsp;207558.54</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp;32768 &nbsp; 3519.74 &nbsp; &nbsp;107413.93</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp;65536 &nbsp; 3141.80 &nbsp; &nbsp; 47940.04</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;131072 &nbsp; 3368.65 &nbsp; &nbsp; 25700.76</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;262144 &nbsp; 2211.53 &nbsp; &nbsp; &nbsp;8436.31</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">&nbsp;524288 &nbsp; 2264.90 &nbsp; &nbsp; &nbsp;4319.95</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">1048576 &nbsp; 2282.69 &nbsp; &nbsp; &nbsp;2176.94</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">2097152 &nbsp; 2250.72 &nbsp; &nbsp; &nbsp;1073.23</font></span></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">4194304 &nbsp; 2087.00 &nbsp; &nbsp; &nbsp; 497.58</font></span></div>

<div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br></font></span></div></div><div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><div>

<div><div><font class="Apple-style-span" face="arial, sans-serif"># OSU MPI Multi BW / Message Rate Test (Version 1.0)</font></div><div><font class="Apple-style-span" face="arial, sans-serif"># [ pairs: 4 ] [ window size: 64 ]</font></div>

<div><font class="Apple-style-span" face="arial, sans-serif"><br></font></div><div><font class="Apple-style-span" face="arial, sans-serif"># &nbsp;Size &nbsp; &nbsp;MB/sec &nbsp; &nbsp;Messages/sec</font></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp;3.65 &nbsp; 3651934.64</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; &nbsp;2 &nbsp; &nbsp; &nbsp;8.16 &nbsp; 4080341.34</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; &nbsp;4 &nbsp; &nbsp; 15.66 &nbsp; 3914908.02</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; &nbsp;8 &nbsp; &nbsp; 31.32 &nbsp; 3915621.85</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; 16 &nbsp; &nbsp; 62.67 &nbsp; 3916764.51</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; 32 &nbsp; &nbsp;124.37 &nbsp; 3886426.18</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp; 64 &nbsp; &nbsp;246.38 &nbsp; 3849640.84</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp;128 &nbsp; &nbsp;486.39 &nbsp; 3799914.44</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp;256 &nbsp; &nbsp;942.40 &nbsp; 3681232.25</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; &nbsp;512 &nbsp; 1664.21 &nbsp; 3250414.19</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; 1024 &nbsp; 2756.50 &nbsp; 2691891.86</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; 2048 &nbsp; 3829.45 &nbsp; 1869848.54</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; 4096 &nbsp; 4465.25 &nbsp; 1090148.56</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp; 8192 &nbsp; 4777.45 &nbsp; &nbsp;583184.51</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp;16384 &nbsp; 4822.75 &nbsp; &nbsp;294357.30</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp;32768 &nbsp; 4829.77 &nbsp; &nbsp;147392.80</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;&nbsp;65536 &nbsp; 4556.93 &nbsp; &nbsp; 69533.18</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;131072 &nbsp; 4789.32 &nbsp; &nbsp; 36539.60</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;262144 &nbsp; 3631.68 &nbsp; &nbsp; 13853.75</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">&nbsp;524288 &nbsp; 3679.31 &nbsp; &nbsp; &nbsp;7017.72</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">1048576 &nbsp; 3553.61 &nbsp; &nbsp; &nbsp;3388.99</font></span></div>

<div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">2097152 &nbsp; 3113.12 &nbsp; &nbsp; &nbsp;1484.45</font></span></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">4194304 &nbsp; 2452.69 &nbsp; &nbsp; &nbsp; 584.77</font></span></div>

</div></div><div><br></div><div>So from a messaging standpoint, you can see that you squeeze more data through with more processes; I&#39;d guess that this is because there&#39;s processing to be done within MPI to move the data, and a lot of the bookkeeping steps probably cache well (updating the same status structure on a communication multiple times; perhaps reusing the structure for subsequent transfers and finding it still in cache) so the performance scaling is not completely FSB bound.</div>

<div><br></div><div>I&#39;m sure there&#39;s plenty of additional things that could be done here to test different CPU to process layouts, etc, but in testing my own real-world code, I&#39;ve found that, unfortunately, &quot;it depends.&quot; I have some code that nearly scales linearly (multiple computationally expensive operations inside the innermost loop) and some that scales like the STREAM results above (&quot;add one to the next 20 million points&quot;) ...</div>

<div><br></div><div>As always, your mileage may vary. If your speedup looks like the STREAM numbers above, you&#39;re likely memory bound. Try to reformulate your problem to go through memory slower but with more done each pass, or invest in a cluster. At some point -- for some problems -- you can&#39;t beat more memory busses!</div>

</font></span></div><div><br></div><div>Cheers,</div><div>&nbsp;Eric Borisch</div><div><br></div><div>--</div><div>&nbsp;<a href="mailto:borisch.eric@mayo.edu">borisch.eric@mayo.edu</a></div><div>&nbsp;MRI Research</div><div>&nbsp;Mayo Clinic</div>

<div><br></div><div><span class="Apple-style-span" style="font-size: small; "><font class="Apple-style-span" face="arial, sans-serif">On Mon, Jul 14, 2008 at 9:48 PM, Gus Correa &lt;<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>&gt; wrote:</font></span></div>

<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">Hello Sami and list<br>

<br>

Oh, well, as you see, an expert who claims to know the answers to these problems<br>

seems not to be willing to share these answers with less knowledgeable MPI users like us.<br>

So, maybe we can find the answers ourselves, not by individual &quot;homework&quot; brainstorming,<br>

but through community collaboration and generous information sharing,<br>

which is the hallmark of this mailing list.<br><br>

I Googled around today to find out how to assign MPI processes to specific processors,<br>

and I found some interesting information on how to do it.<br><br>

Below is a link to a posting from the computational fluid dynamics (CFD) community that may be of interest.<br>

Not surprisingly, they are struggling with the same type of problems all of us have,<br>

including how to tie MPI processes to specific processors:<br><br></font></span>


<a href="http://openfoam.cfd-online.com/cgi-bin/forum/board-auth.cgi?file=/1/5949.html#POST18006" target="_blank"><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">http://openfoam.cfd-online.</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">com/cgi-bin/forum/board-auth.</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">cgi?file=/1/5949.html#</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">POST18006</font></span></a><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br>

<br>

I would summarize these problems as related to three types of bottleneck:<br><br>

1) Multicore processor bottlenecks (standalone machines and clusters)<br>

2) Network fabric bottlenecks (clusters)<br>

3) File system bottlenecks (clusters)<br><br>

All three types of problems are due to contention for some type of system resource<br>

by the MPI processes that take part in a computation/program.<br><br>

Our focus on this thread, started by Zach, has been on problem 1),<br>

although most of us may need to look into problems 2) and 3) sooner or later.<br>

(I have all the three of them already!)<br><br>

The CFD folks use MPI as we do.<br>

They seem to use another MPI flavor, but the same problems are there.<br>

The problems are not caused by MPI itself, but they become apparent when you run MPI programs.<br>

That has been my experience too.<br><br>

As for how to map the MPI processes to specific processors (or cores),<br>

the key command seems to be &quot;taskset&quot;, as my googling afternoon showed.<br>

Try &quot;man taskset&quot; for more info.<br><br>

For a standalone machine like yours, something like the command line below should work to<br>

force execution on &quot;processors&quot; 0 and 2 (which in my case are two different physical CPUs):<br><br>

mpiexec -n 2 taskset -c 0,2 &nbsp;my_mpi_program<br><br>

You need to check on your computer (&quot;more /proc/cpuinfo&quot;)<br>

what are the exact &quot;processor&quot; numbers that correspond to separate physical CPUs. Most likely they are the even numbered processors only, or the odd numbered only,<br>

since you have dual-core CPUs (integers module 2), with &quot;processors&quot; 0,1 being the four<br>

cores of the first physical CPU, &quot;processors&quot; 2,3 the cores of the second physical CPU, and so on.<br>

At least, this is what I see on my dual-core dual-processor machine.<br>

I would say for quad-cores the separate physical CPUs would be processors 0,4,8, etc,<br>

or 1,5,7, etc, and so on (integers module 4), with &quot;processors&quot; 0,1,2,3 being the four cores<br>

in the first physical CPU, and so on. <br>

In /proc/cpuinfo look for the keyword &quot;processor&quot;.<br>

These are the numbers you need to use in &quot;taskset -c&quot;.<br>

However, other helpful information comes in the keywords &quot;physical id&quot;,<br>

&quot;core id&quot;, &quot;siblings&quot;, and &quot;cpu cores&quot;.<br>

They will allow you to map cores and physical CPUs to<br>

the &quot;processor&quot; number.<br><br>

The &quot;taskset&quot; &nbsp;command line above worked in one of my standalone multicore machines,<br>

and I hope a variant of it will work on your machine also.<br>

It works with the &quot;mpiexec&quot; that comes with the MPICH distribution, and also with<br>

the &quot;mpiexec&quot; associated to the Torque/PBS batch system, which is nice for clusters as well.<br><br>

&quot;Taskset&quot; can change the default behavior of the Linux scheduler, which is to allow processes to<br>

be moved from one core/CPU to another during execution.<br>

The scheduler does this to ensure optimal CPU use (i.e. load balance).<br>

With taskset you can force execution to happen on the cores you specify on the command line,<br>

i.e. you can force the so called &quot;CPU affinity&quot; you wish.<br>

Note that the &quot;taskset&quot; man page uses both the terms &quot;CPU&quot; and &quot;processor&quot;, and doesn&#39;t use the term &quot;core&quot;,<br>

which may be &nbsp;a bit confusing. Make no mistake, &quot;processor&quot; and &quot;CPU&quot; there stand for what we&#39;ve been calling &quot;core&quot; here.<br><br>

Other postings that you may find useful on closely related topics are:<br><br></font></span>


<a href="http://www.ibm.com/developerworks/linux/library/l-scheduler/" target="_blank"><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">http://www.ibm.com/</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">developerworks/linux/library/</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">l-scheduler/</font></span></a><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br>

</font></span>

<a href="http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html" target="_blank"><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">http://www.cyberciti.biz/tips/</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">setting-processor-affinity-</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">certain-task-or-process.html</font></span></a><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br>

<br>

I hope this helps,<br><br>

Still, we have a long way to go to sort out how much of the multicore bottleneck can<br>

be ascribed to lack of memory bandwidth, and how much may be &nbsp;perhaps associated to how<br>

memcpy is compiled by different compilers,<br>

or if there are other components of this problem that we don&#39;t see now.<br><br>

Maybe our community won&#39;t find a solution to Zach&#39;s problem: &quot;Why is my quad core slower than cluster?&quot;<br>

However, I hope that through collaboration, and by sharing information,<br>

we may be able to nail down the root of the problem,<br>

and perhaps to find ways to improve the alarmingly bad performance<br>

some of us have reported on multicore machines.</font></span>


<div class="Ih2E3d"><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br><br>

Gus Correa<br><br>

-- <br>

------------------------------</font></span>


<span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">------------------------------</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">---------<br>


Gustavo J. Ponce Correa, PhD - Email: </font></span><a href="mailto:gus@ldeo.columbia.edu" target="_blank"><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">gus@ldeo.columbia.edu</font></span></a><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br>


Lamont-Doherty Earth Observatory - Columbia University<br>

P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA<br>

------------------------------</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">------------------------------</font></span><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif">---------<br>

</font></span></div></blockquote></div><span class="Apple-style-span" style="font-size: small;"><font class="Apple-style-span" face="arial, sans-serif"><br clear="all"><br></font></span></div></div>