[petsc-dev] Memory problem with OpenMP and Fieldsplit sub solvers

Barry Smith bsmith at petsc.dev
Fri Jan 22 14:33:39 CST 2021



> On Jan 22, 2021, at 2:11 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> 
> 
> On Fri, Jan 22, 2021 at 12:57 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
> The library is thread safe and its functions can be called from multiple host threads, even with the same handle. When multiple threads share the same handle, 
> 
> 
> 
> extreme care needs to be taken when the handle configuration is changed because that change will affect potentially subsequent cuBLAS calls in all threads. 
> 
>                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> 
> It is even more true for the destruction of the handle. So it is not recommended that multiple thread share the same cuBLAS handle.
> 
> 
> From my reading of this there should be absolutely no issue. The handle configuration is never being changed. It should be set in PetscInitialize() and destroyed in PetscFinalize(). Since lazy configuration of cuBLAS is turned off (right?).
> 
> You had me add to init.c:
> 
> #if defined(PETSC_HAVE_THREADSAFETY)
>   ierr = PetscCUPMInitializeCheck();CHKERRQ(ierr);
> #endif
> 
> 
> because getHandle, which does lazy initialization, was not thread safe. But I think it is now.
> 
> I am going to test without it and remove if it's OK (I'm sure it will be)

No, you can't remove this because it initializes the device etc. You can't have all the threads trying to initialize the device.

  Barry

> 
> And I don't know what they mean by "when the handle configuration is changed". It makes sense to me that the handle would not be thread safe. It is an object and it has state. The purpose of making an object, and not have one global object is to make it thread safe...
> 
> I can see that it is not thread safe. There were clear non-deterministic race conditions and it differed to a random degree from a serial run each time. A vector norm returned 0 sometimes when the first entry != 0.
> 
> Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210122/55c38fad/attachment.html>


More information about the petsc-dev mailing list