<div dir="ltr">Ah yes the 10-100 merged.<div>And I am not calling the <b>GPU timers </b>so the Mflops is messed up.</div><div><br></div><div><div>And, I assume WaitForCUDA blocks on this MPI process's Cuda calls here. One stream per process. This does not block with other MPI process Cuda calls.</div><div><br></div><div></div></div><div> err = WaitForCUDA();CHKERRCUDA(err);<br><b> ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);<br></b> ierr = PetscLogEventEnd(MAT_CUSPARSEGenerateTranspose,A,0,0,0);CHKERRQ(ierr);<br></div><div><br></div><div><br></div><div>Thanks,</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Dec 29, 2020 at 1:12 PM Barry Smith <<a href="mailto:bsmith@petsc.dev">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><br></div> Mark,<div><br></div><div> Aside from formatting are you sure there is an issue. Could it be 100 % of the time and 100% of the flops and since there is not room for all the digits they end up sitting on top of each other? </div><div><br></div><div> Similarly could the flop rates be overlapped on top of each other? You could try adding more digits in the print statement to make room for these values.</div><div><b style="font-family:monospace"><br></b></div><div><b style="font-family:monospace"> Barry</b></div><div><b style="font-family:monospace"> </b><div><br><blockquote type="cite"><div>On Dec 29, 2020, at 8:32 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:</div><br><div><div dir="ltr">I am seeing this from a GPU kernel. The % flops is messed up and the flop rate does not look right:<div><br><div><font face="monospace">------------------------------------------------------------------------------------------------------------------------<br>Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU<br> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F<br>---------------------------------------------------------------------------------------------------------------------------------------------------------------<br></font></div><div><font face="monospace">Jac-kernel 13068 1.0 1.6106e+01 1.1 6.13e+13 1.0 0.0e+00 0.0e+00 0.0e+00 <b>10100</b> 0 0 0 <b>10100</b> 0 0 0 <b>136983459</b> 0 0 0.00e+00 0 0.00e+00 100</font><br></div></div><div><font face="monospace"><br></font></div><div><font face="monospace">I use this in <a href="http://landau.cu/" target="_blank">landau.cu</a>:</font></div><div><font face="monospace"><br></font></div><div> ierr = PetscLogGpuFlops(flops*nip);CHKERRQ(ierr);<font face="monospace"><br></font></div><div><br></div><div>Any idea what is going on here?</div><div><br></div><div>Thanks,</div><div>Mark</div></div>
</div></blockquote></div><br></div></div></blockquote></div>