<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div> You might look at the notes about MPI binding. It might give you a bit better performance. <a href="https://www.mcs.anl.gov/petsc/documentation/faq.html#computers" class="">https://www.mcs.anl.gov/petsc/documentation/faq.html#computers</a></div><div><br class=""></div><div> The streams is exactly the DAXPY operation so this is the speed up you should expect for VecAXPY() which has 2 loads and 1 store per 1 multipy and 1 add</div><div><br class=""></div><div> VecDot() has 2 loads per 1 multiply and 1 add but also a global reduction</div><div><br class=""></div><div> Sparse multiply with AIJ has an integer load, 2 double loads plus 1 store per row with 1 multiply and 1 add plus communication needed for off-process portion</div><div><br class=""></div><div> Function evaluations often have higher arithmetic intensity so should give a bit higher speedup</div><div><br class=""></div><div> Jacobian evaluations often have higher arithmetic intensity but they may have MatSetValues() which is slow because no arithmetic intensity just memory motion</div><div><br class=""></div><div> Barry</div><div><br class=""></div><div><br class=""></div><div><br class=""><blockquote type="cite" class=""><div class="">On Jun 9, 2020, at 3:43 PM, Fande Kong <<a href="mailto:fdkong.jd@gmail.com" class="">fdkong.jd@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class="">Hi All,<br class=""><div class=""><br class=""></div><div class=""><div class="">I am trying to interpret the results from "make stream" on two compute nodes, where each node has 48 cores. </div><div class=""><br class=""></div><div class="">If my calculations are memory bandwidth limited, such as AMG, MatVec, GMRES, etc.. </div><div class="">The best speedup I could get is 16.6938 if I start from one core?? The speedup for function evaluations and Jacobian evaluations can be better than16.6938?</div></div><div class=""><br class=""></div><div class="">Thanks,</div><div class=""><br class=""></div><div class="">Fande,</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div class="">Running streams with 'mpiexec ' using 'NPMAX=96' </div><div class="">1 19412.4570 Rate (MB/s)</div><div class="">2 29457.3988 Rate (MB/s) 1.51744 </div><div class="">3 40483.9318 Rate (MB/s) 2.08546 </div><div class="">4 51429.3431 Rate (MB/s) 2.64929 </div><div class="">5 59849.5168 Rate (MB/s) 3.08304 </div><div class="">6 66124.3461 Rate (MB/s) 3.40628 </div><div class="">7 70888.1170 Rate (MB/s) 3.65167 </div><div class="">8 73436.2374 Rate (MB/s) 3.78294 </div><div class="">9 77441.7622 Rate (MB/s) 3.98927 </div><div class="">10 78115.3114 Rate (MB/s) 4.02397 </div><div class="">11 81449.3315 Rate (MB/s) 4.19572 </div><div class="">12 82812.3471 Rate (MB/s) 4.26593 </div><div class="">13 81442.2114 Rate (MB/s) 4.19535 </div><div class="">14 83404.1657 Rate (MB/s) 4.29642 </div><div class="">15 84165.8536 Rate (MB/s) 4.33565 </div><div class="">16 83739.2910 Rate (MB/s) 4.31368 </div><div class="">17 83724.8109 Rate (MB/s) 4.31293 </div><div class="">18 83225.0743 Rate (MB/s) 4.28719 </div><div class="">19 81668.2002 Rate (MB/s) 4.20699 </div><div class="">20 83678.8007 Rate (MB/s) 4.31056 </div><div class="">21 81400.4590 Rate (MB/s) 4.1932 </div><div class="">22 81944.8975 Rate (MB/s) 4.22124 </div><div class="">23 81359.8615 Rate (MB/s) 4.19111 </div><div class="">24 80674.5064 Rate (MB/s) 4.1558 </div><div class="">25 83761.3316 Rate (MB/s) 4.31481 </div><div class="">26 87567.4876 Rate (MB/s) 4.51088 </div><div class="">27 89605.4435 Rate (MB/s) 4.61586 </div><div class="">28 94984.9755 Rate (MB/s) 4.89298 </div><div class="">29 98260.5283 Rate (MB/s) 5.06171 </div><div class="">30 99852.8790 Rate (MB/s) 5.14374 </div><div class="">31 102736.3576 Rate (MB/s) 5.29228 </div><div class="">32 108638.7488 Rate (MB/s) 5.59633 </div><div class="">33 110431.2938 Rate (MB/s) 5.68867 </div><div class="">34 112824.2031 Rate (MB/s) 5.81194 </div><div class="">35 116908.3009 Rate (MB/s) 6.02232 </div><div class="">36 121312.6574 Rate (MB/s) 6.2492 </div><div class="">37 122507.3172 Rate (MB/s) 6.31074 </div><div class="">38 127456.2504 Rate (MB/s) 6.56568 </div><div class="">39 130098.7056 Rate (MB/s) 6.7018 </div><div class="">40 134956.4461 Rate (MB/s) 6.95204 </div><div class="">41 138309.2465 Rate (MB/s) 7.12475 </div><div class="">42 141779.7997 Rate (MB/s) 7.30353 </div><div class="">43 145653.3687 Rate (MB/s) 7.50307 </div><div class="">44 149131.2087 Rate (MB/s) 7.68223 </div><div class="">45 151611.6104 Rate (MB/s) 7.81 </div><div class="">46 155554.6394 Rate (MB/s) 8.01312 </div><div class="">47 159033.1938 Rate (MB/s) 8.19231 </div><div class="">48 162216.5600 Rate (MB/s) 8.35629 </div><div class="">49 165034.8116 Rate (MB/s) 8.50147 </div><div class="">50 168001.4823 Rate (MB/s) 8.65429 </div><div class="">51 170899.9045 Rate (MB/s) 8.8036 </div><div class="">52 175687.8033 Rate (MB/s) 9.05024 </div><div class="">53 178203.9203 Rate (MB/s) 9.17985 </div><div class="">54 179973.3914 Rate (MB/s) 9.27101 </div><div class="">55 182207.3495 Rate (MB/s) 9.38608 </div><div class="">56 185712.9643 Rate (MB/s) 9.56667 </div><div class="">57 188805.5696 Rate (MB/s) 9.72598 </div><div class="">58 193360.9158 Rate (MB/s) 9.96064 </div><div class="">59 198160.8016 Rate (MB/s) 10.2079 </div><div class="">60 201297.0129 Rate (MB/s) 10.3695 </div><div class="">61 203618.7672 Rate (MB/s) 10.4891 </div><div class="">62 209599.2783 Rate (MB/s) 10.7971 </div><div class="">63 211651.1587 Rate (MB/s) 10.9028 </div><div class="">64 210254.5035 Rate (MB/s) 10.8309 </div><div class="">65 218576.4938 Rate (MB/s) 11.2596 </div><div class="">66 220280.0853 Rate (MB/s) 11.3473 </div><div class="">67 221281.1867 Rate (MB/s) 11.3989 </div><div class="">68 228941.1872 Rate (MB/s) 11.7935 </div><div class="">69 232206.2708 Rate (MB/s) 11.9617 </div><div class="">70 233569.5866 Rate (MB/s) 12.0319 </div><div class="">71 238293.6355 Rate (MB/s) 12.2753 </div><div class="">72 238987.0729 Rate (MB/s) 12.311 </div><div class="">73 246013.4684 Rate (MB/s) 12.6729 </div><div class="">74 248850.8942 Rate (MB/s) 12.8191 </div><div class="">75 249355.6899 Rate (MB/s) 12.8451 </div><div class="">76 252515.6110 Rate (MB/s) 13.0079 </div><div class="">77 257489.4268 Rate (MB/s) 13.2641 </div><div class="">78 260884.2771 Rate (MB/s) 13.439 </div><div class="">79 264341.8661 Rate (MB/s) 13.6171 </div><div class="">80 269329.1376 Rate (MB/s) 13.874 </div><div class="">81 272286.4070 Rate (MB/s) 14.0263 </div><div class="">82 273325.7822 Rate (MB/s) 14.0799 </div><div class="">83 277334.6699 Rate (MB/s) 14.2864 </div><div class="">84 280254.7286 Rate (MB/s) 14.4368 </div><div class="">85 282219.8194 Rate (MB/s) 14.538 </div><div class="">86 289039.2677 Rate (MB/s) 14.8893 </div><div class="">87 291234.4715 Rate (MB/s) 15.0024 </div><div class="">88 295941.1159 Rate (MB/s) 15.2449 </div><div class="">89 298136.3163 Rate (MB/s) 15.358 </div><div class="">90 302820.9080 Rate (MB/s) 15.5993 </div><div class="">91 306387.5008 Rate (MB/s) 15.783 </div><div class="">92 310127.0223 Rate (MB/s) 15.9756 </div><div class="">93 310219.3643 Rate (MB/s) 15.9804 </div><div class="">94 317089.5971 Rate (MB/s) 16.3343 </div><div class="">95 315457.0938 Rate (MB/s) 16.2502 </div><div class="">96 324068.8172 Rate (MB/s) 16.6938 </div></div></div></div></div></div>
</div></blockquote></div><br class=""></body></html>