[petsc-dev] funny GPU timer data

Barry Smith bsmith at petsc.dev
Tue Dec 29 13:22:50 CST 2020



> On Dec 29, 2020, at 1:07 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> Ah yes the 10-100 merged.
> And I am not calling the GPU timers so the Mflops is messed up.

  
> 
> 
> And, I assume WaitForCUDA blocks on this MPI process's Cuda calls here. One stream per process. This does not block with other MPI process Cuda calls.

  Not sure what you mean by this. The WaitForCUDA() waits on the CPU until the device has finished the kernel (not related to other MPI processes). It is an attempt to get a reasonably accurate timing for JUST the GPU kernel run and not include other things related to the CPU. (of course it does include the extra sync time).

  Hong Zhang is working on a better timer that uses CUDA event timers that will be more accurate and do not have the extra overhead of the device synch. Once that is merged it will be the default and the WaitForCUDA() will no longer be needed or used.

  Barry



> 
>   err  = WaitForCUDA();CHKERRCUDA(err);
>   ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
>   ierr = PetscLogEventEnd(MAT_CUSPARSEGenerateTranspose,A,0,0,0);CHKERRQ(ierr);
> 
> 
> Thanks,
> 
> On Tue, Dec 29, 2020 at 1:12 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
>   Mark,
> 
>     Aside from formatting are you sure there is an issue.   Could it be 100 % of the time and 100% of the flops and since there is not room for all the digits they end up sitting on top of each other? 
> 
>    Similarly could the flop rates be overlapped on top of each other?  You could try adding more digits in the print statement to make room for these values.
> 
>    Barry
>  
> 
>> On Dec 29, 2020, at 8:32 AM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>> 
>> I am seeing this from a GPU kernel.  The % flops is messed up and the flop rate does not look right:
>> 
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Jac-kernel        13068 1.0 1.6106e+01 1.1 6.13e+13 1.0 0.0e+00 0.0e+00 0.0e+00 10100  0  0  0  10100  0  0  0 136983459       0      0 0.00e+00    0 0.00e+00 100
>> 
>> I use this in landau.cu <http://landau.cu/>:
>> 
>>  ierr = PetscLogGpuFlops(flops*nip);CHKERRQ(ierr);
>> 
>> Any idea what is going on here?
>> 
>> Thanks,
>> Mark
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20201229/a2a95328/attachment.html>


More information about the petsc-dev mailing list