[petsc-dev] Kokkos/Crusher perforance
Barry Smith
bsmith at petsc.dev
Sun Jan 23 22:59:36 CST 2022
> On Jan 23, 2022, at 11:47 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Barry Smith via petsc-dev <petsc-dev at mcs.anl.gov> writes:
>
>> The PetscLogGpuTimeBegin()/End was written by Hong so it works with events to get a GPU timing, it is not suppose to include the CPU kernel launch times or the time to move the scalar arguments to the GPU. It may not be perfect but it is the best we can do to capture the time the GPU is actively doing the numerics, which is what we want.
>
> As we discussed at the time, collecting the results can be asynchronous and this would be useful to reduce the negative impact of profiling on end-to-end performance.
>
> But I think what's proposed here is okay because PetscLogGpuTimeBegin() starts counting when the device reaches that point, not when it's given on the host.
This is how it is suppose to work.
We should make it easy to turn off the logging and synchronizations (from PetscLogGpu) for everything Vec and below, and everything Mat and below to remove all the synchronizations needed for the low level timing. I think we can do that by having PetscLogGpu take a PETSc class id argument.
More information about the petsc-dev
mailing list