<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><br class=""></div> I think you should contact the crusher ECP technical support team and tell them you are getting dismel performance and ask if you should expect better. Don't waste time flogging a dead horse. <br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jan 24, 2022, at 2:16 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com" class="">knepley@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class="">On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" class="">junchao.zhang@gmail.com</a>> wrote:<br class=""></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div dir="ltr" class=""><br class=""></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank" class="">mfadams@lbl.gov</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div dir="ltr" class=""><br class=""></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank" class="">junchao.zhang@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="">Mark, I think you can benchmark individual vector operations, and once we get reasonable profiling results, we can move to solvers etc.</div></blockquote><div class=""><br class=""></div><div class="">Can you suggest a code to run or are you suggesting making a vector benchmark code?</div></div></div></blockquote><div class="">Make a vector benchmark code, testing vector operations that would be used in your solver.</div><div class="">Also, we can run MatMult() to see if the profiling result is reasonable.</div><div class="">Only once we get some solid results on basic operations, it is useful to run big codes.</div></div></div></blockquote><div class=""><br class=""></div><div class="">So we have to make another throw-away code? Why not just look at the vector ops in Mark's actual code?</div><div class=""><br class=""></div><div class=""> Matt</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class="gmail_quote"><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class=""><br class=""><div class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class="">--Junchao Zhang</div></div></div><br class=""></div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank" class="">mfadams@lbl.gov</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div dir="ltr" class=""><br class=""></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank" class="">bsmith@petsc.dev</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><br class=""></div> Here except for VecNorm the GPU is used effectively in that most of the time is time is spent doing real work on the GPU<div class=""><br class=""></div><div class=""><div class="">VecNorm 402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 9 1 0 0 33 30230 225393 0 0.00e+00 0 0.00e+00 100</div><div class=""><br class=""></div><div class="">Even the dots are very effective, only the VecNorm flop rate over the full time is much much lower than the vecdot. Which is somehow due to the use of the GPU or CPU MPI in the allreduce?</div></div></div></blockquote><div class=""><br class=""></div><div class="">The VecNorm GPU rate is relatively high on Crusher and the CPU rate is about the same as the other vec ops. I don't know what to make of that.</div><div class=""><br class=""></div><div class="">But Crusher is clearly not crushing it. </div><div class=""><br class=""></div><div class="">Junchao: Perhaps we should ask Kokkos if they have any experience with Crusher that they can share. They could very well find some low level magic.</div><div class=""><br class=""></div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 24, 2022, at 12:14 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank" class="">mfadams@lbl.gov</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><br class=""></div><div dir="ltr" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br class="">
Mark, can we compare with Spock?<br class=""></blockquote><div class=""><br class=""></div><div class=""> Looks much better. This puts two processes/GPU because there are only 4.</div></div></div>
</div>
<span id="gmail-m_7034942330692579726gmail-m_557496224917787162gmail-m_6873512096861806548gmail-m_-3802472944769362989gmail-m_2338317291810935402cid:f_kysy6if70" class=""><jac_out_001_kokkos_Spock_6_1_notpl.txt></span></div></blockquote></div><br class=""></div></div></blockquote></div></div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div><br clear="all" class=""><div class=""><br class=""></div>-- <br class=""><div dir="ltr" class="gmail_signature"><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class="">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br class="">-- Norbert Wiener</div><div class=""><br class=""></div><div class=""><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank" class="">https://www.cse.buffalo.edu/~knepley/</a><br class=""></div></div></div></div></div></div></div></div>
</div></blockquote></div><br class=""></body></html>