<div dir="ltr"><div dir="ltr">On Tue, Jul 16, 2019 at 9:07 PM Xiangdong via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello everyone,<div><br></div><div>I am new to petsc gpu and have a simple question. </div><div><br></div><div>When I tried to solve Ax=b where A is MATAIJCUSPARSE and b and x are VECSEQCUDA  with GMRES(or GCR) and pcnone, I found that during each krylov iteration, there are one call MemCpy(HtoD) and one call MemCpy(DtoH). Does that mean the Krylov solve is not 100% on GPU and the solve still needs some work from CPU? What are these MemCpys for during the each iteration?</div></div></blockquote><div><br></div><div>We have GPU experts on the list, but there is definitely a communication because we do not do orthogonalization on the GPU,</div><div>just the BLAS ops. This is a very small amount of data, so it just contributed latency, and I would guess that it is less then kernel</div><div>launch latency.</div><div><br></div><div>  Thanks,</div><div><br></div><div>    Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Thank you.</div><div><br></div><div>Best,</div><div>Xiangdong</div></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>