[petsc-dev] PETSc issue I cannot post combine WaitForCUDA(); inside PetscLogGpuTimeEnd();

Barry Smith bsmith at petsc.dev
Fri Aug 28 11:42:43 CDT 2020



> On Aug 28, 2020, at 10:26 AM, Stefano Zampini <stefano.zampini at gmail.com> wrote:
> 
> 
> 
>> On Aug 28, 2020, at 5:18 PM, Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>> 
>> 
>> 
>>> On Aug 28, 2020, at 5:35 AM, Karl Rupp <rupp at iue.tuwien.ac.at <mailto:rupp at iue.tuwien.ac.at>> wrote:
>>> 
>>> Hi,
>>> 
>>>>  Since we cannot post issues (reported here https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith <https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith>) here is my issue so I don't forget it.
>>>>  I think
>>>> err  = WaitForCUDA();CHKERRCUDA(err);
>>>> ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
>>>> should be changed to include WaitForCUDA() actually WaitForDevice() inside the PetscLogGpuTimeEnd().
>>>> Currently sometimes the WaitForCUDA() is missing in a few places resulting in bad timing.
>>>> Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and need to be fixed.
>>>> The current model is a maintenance nightmare.
>>>> Does anyone see a problem with making this change?
>>> 
>>> I'm fine with this change, as the maintenance benefits outweigh the performance cost for typical use cases.
>>> 
>>> I propose to also add the WaitForDevice(); at PetscLogGpuTimeBegin(). This will ensure that no previous GPU kernel executions spill over into the timed section.

  Karl,

   When synchronization is turned on the precious GPU kernels should always have their own WaitForDevice(), so are you concerned about buggy code that does not include WaitForDevice?

>> 
>>  Might this incur an extra overhead checking the device? Or will it always be true that if there are no outstanding kernels it will not go to the GPU and the check will return immediately?
> 
> If we want to have a two barrier model, I propose we log the timing for waiting at the first barrier separately.
>> 
>> Barry
>> 
>>> 
>>> Best regards,
>>> Karli
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200828/def3d7cd/attachment.html>


More information about the petsc-dev mailing list