<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">Not clear how to interpret, the "gpu" FLOP rate for dot and norm are a good amount higher (exact details of where the log functions are located can affect this) but the over flop rates of them are not much better. Scatter is better without GPU MPI. How much of this is noise, need to see statistics from multiple runs. Certainly not satisfying.</div><div class=""><br class=""></div><div class="">GPU MPI</div><div class=""><br class=""></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">MatMult 400 1.0 8.4784e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00 2 55 61 54 0 68 91100100 0 98667 139198 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">KSPSolve 2 1.0 1.2222e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03 3 60 61 54 60 100100100100100 75509 122610 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecTDot 802 1.0 1.3863e+00 1.3 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 2 0 0 40 10 3 0 0 67 19186 48762 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecNorm 402 1.0 9.2933e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 6 1 0 0 33 14345 127332 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecAXPY 800 1.0 8.2405e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 7 3 0 0 0 32195 62486 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecAYPX 398 1.0 8.6891e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 6 1 0 0 0 15190 19019 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecPointwiseMult 402 1.0 3.5227e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 18922 39878 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecScatterBegin 400 1.0 1.1519e+00 2.1 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00 0 0 61 54 0 7 0100100 0 0 0 0 0.00e+00 0 0.00e+00 0</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecScatterEnd 400 1.0 1.5642e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 10 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class=""><br class=""></span></font></div><div class=""><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class=""><br class=""></span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">MatMult 400 1.0 8.1754e+00 1.0 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00 2 55 61 54 0 65 91100100 102324 133771 800 4.74e+02 800 4.74e+02 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">KSPSolve 2 1.0 1.2605e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03 2 60 61 54 60 100100100100100 73214 113908 800 4.74e+02 800 4.74e+02 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecTDot 802 1.0 2.0607e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 2 0 0 40 15 3 0 0 67 12907 25655 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecNorm 402 1.0 9.5100e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 6 1 0 0 33 14018 96704 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecAXPY 800 1.0 7.9864e-01 1.1 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 6 3 0 0 0 33219 65843 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecAYPX 398 1.0 8.0719e-01 1.7 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 5 1 0 0 0 16352 21253 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecPointwiseMult 402 1.0 3.7318e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 17862 38464 0 0.00e+00 0 0.00e+00 100</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecScatterBegin 400 1.0 1.4075e+00 1.8 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00 0 0 61 54 0 9 0100100 0 0 0 0 0.00e+00 800 4.74e+02 0</span></font></div><div class=""><font face="Courier New" class=""><span style="font-style: normal; font-size: 12px;" class="">VecScatterEnd 400 1.0 6.3044e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 0 0 0 0 0 0 800 4.74e+02 0 0.00e+00 0</span></font></div></div><div class=""><br class=""></div><div><br class=""><blockquote type="cite" class=""><div class="">On Jan 24, 2022, at 10:25 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" class="">mfadams@lbl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_quote"><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div class=""></div><div class=""> Mark,</div><div class=""><br class=""></div><div class=""> Can you run both with GPU aware MPI?</div><div class=""><br class=""></div></div></blockquote><div class=""><br class=""></div><div class="">Perlmuter fails with GPU aware MPI. I think there are know problems with this that are being worked on.</div><div class=""><br class=""></div><div class="">And here is Crusher with GPU aware MPI.</div><div class=""> </div></div></div>
<span id="cid:f_kysuceun0"><jac_out_001_kokkos_Crusher_6_1_notpl.txt></span></div></blockquote></div><br class=""></body></html>