<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Sep 25, 2021 at 8:12 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Sep 25, 2021 at 4:45 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I am testing my Landau code, which is MPI serial, but with many independent MPI processes driving each GPU, in an MPI parallel harness code (Landau ex2).<div><br></div><div>Vector operations with Kokkos Kernels and cuSparse are about the same (KK is faster) and a bit expensive with one process / GPU. About the same as my Jacobian construction, which is expensive but optimized on the GPU.  (I am using arkimex adaptive TS. I am guessing that it does a lot of vector ops, because there are a lot.)</div><div><br></div><div>With 14 or 15 processes, all doing the same MPI serial problem, cuSparse is about 2.5x more expensive than KK. KK does degrad by about 15% from the one processor case. So KK is doing fine, but something bad is happening with cuSparse.</div></div></blockquote><div>AIJKOKKOS and AIJCUSPARSE have different algorithms? I don't know.  To know exactly, the best approach is to consult with Peng@nvidia to profile the code. </div></div></div></blockquote><div><br></div><div>Yea, I could ask Peng if he has any thoughts. </div><div><br></div><div>I am also now having a problem with snes/tests/ex13 scaling study (for my ECP report). </div><div>The cuSparse version of GAMG is hanging on an 8 node job with a refinement of 3. It works on one node with a refinement of 4 and on 8 nodes with a refinement of 2.</div><div>I recently moved from CUDA-10 to CUDA-11 on summit because MPS seems to be working with CUDA-11 whereas it was not a while ago.</div><div>I think I will try going back to CUDA-10 and see if I see anything change.</div><div><br></div><div>Thanks,</div><div>Mark</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div><div>Anyone have any thoughts on this?</div><div><br></div><div>Thanks,</div><div>Mark<br><div><br></div></div></div>

</blockquote></div></div>

</blockquote></div></div>