[petsc-users] Why PetscDestroy global collective semantics?
Kozdon, Jeremy (CIV)
jekozdon at nps.edu
Mon Oct 25 11:37:51 CDT 2021
I the PETSc.jl stuff I’ve worked on, I punted on the issue and only register a finalizer when there is 1 MPI rank, so something like this when objects are created:
if MPI.Comm_size(comm) == 1
finalizer(destroy, mat)
end
see: https://github.com/JuliaParallel/PETSc.jl/blob/581f37990b6e54fd31cf2bec8e938d51a73dbc92/src/mat.jl#L210-L212 (warning this is a WIP branch, but shouldn’t change any in the next few weeks).
I’ve tried to think about some other ways of handling this, such as some sort of collective clean up routine that could be called or exploring using a thread to handle the destroy, but not given it a ton of thought and everything could involve collective communication which I would like to avoid.
I’ll have to dig a little bit more into petsc4py discussed elsewhere in the thread, but it seems even there you don’t get the cannot skip completely forgetting about cleanup. I have been wondering if you could use threading to handle this, but then I think a collective barrier would be needed which would not be so nice.
Another garbage collection issue I found was that if you rely on the garbage collector for serial objects and you allow PETSc to be finalizd and reinitialized, you can end up with the garbage collector trying to clean up objects from previous runs. To get around this, I introduced a petsc age to my global petsc object and each object also knew what petsc age it was created during. And an object was only destroyed when it was created in the current petsc age.
function destroy(M::AbstractMat{PetscLib}) where {PetscLib}
if !(finalized(PetscLib)) &&
M.age == getlib(PetscLib).age &&
M.ptr != C_NULL
LibPETSc.MatDestroy(PetscLib, M)
end
M.ptr = C_NULL
return nothing
end
see: https://github.com/JuliaParallel/PETSc.jl/blob/581f37990b6e54fd31cf2bec8e938d51a73dbc92/src/mat.jl#L14-L22
> On Oct 23, 2021, at 11:29 PM, Patrick Sanan <patrick.sanan at gmail.com> wrote:
>
>
> NPS WARNING: *external sender* verify before acting.
>
>
> I think Jeremy (cc‘d) has also been thinking about this in the context of PETSc.jl
>
> Stefano Zampini <stefano.zampini at gmail.com> schrieb am So. 24. Okt. 2021 um 07:52:
> Non-deterministic garbage collection is an issue from Python too, and firedrake folks are also working on that.
>
> We may consider deferring all calls to MPI_Comm_free done on communicators with 1 as ref count (i.e., the call will actually wipe out some internal MPI data) in a collective call that can be either run by the user (on PETSC_COMM_WORLD), or at PetscFinalize() stage.
> I.e., something like that
>
> #define MPI_Comm_free(comm) PutCommInAList(comm)
>
> Comm creation is collective by definition, and thus collectiveness of the order of the destruction can be easily enforced.
> I don't see problems with 3rd party libraries using comms, since we always duplicate the comm we passed them
>
> Lawrence, do you think this may help you?
>
> Thanks
> Stefano
>
> Il giorno dom 24 ott 2021 alle ore 05:58 Barry Smith <bsmith at petsc.dev> ha scritto:
>
> Ahh, this makes perfect sense.
>
> The code for PetscObjectRegisterDestroy() and the actual destruction (called in PetscFinalize()) is very simply and can be found in src/sys/objects/destroy.c PetscObjectRegisterDestroy(), PetscObjectRegisterDestroyAll().
>
> You could easily maintain a new array like PetscObjectRegisterGCDestroy_Objects[] and add objects with PetscObjectRegisterGCDestroy() and then destroy them with PetscObjectRegisterDestroyGCAll(). The only tricky part is that you have to have, in the context of your Julia MPI, make sure that PetscObjectRegisterDestroyGCAll() is called collectively over all the MPI ranks (that is it has to be called where all the ranks have made the same progress on MPI communication) that have registered objects to destroy, generally PETSC_COMM_ALL. We would be happy to incorporate such a system into the PETSc source with a merge request.
>
> Barry
>
>> On Oct 23, 2021, at 10:40 PM, Alberto F. Martín <amartin at cimne.upc.edu> wrote:
>>
>> Thanks all for your very insightful answers.
>>
>> We are leveraging PETSc from Julia in a parallel distributed memory context (several MPI tasks running the Julia REPL each).
>>
>> Julia uses Garbage Collection (GC), and we would like to destroy the PETSc objects automatically when the GC decides so along the simulation.
>>
>> In this context, we cannot guarantee deterministic destruction on all MPI tasks as the GC decisions are local to each task, no global semantics guaranteed.
>>
>> As far as I understand from your answers, there seems to be the possibility to defer the destruction of objects till points in the parallel program in which you can guarantee collective semantics, correct? If yes I guess that this may occur at any point in the simulation, not necessarily at shut down via PetscFinalize(), right?
>>
>> Best regards,
>>
>> Alberto.
>>
>>
>>
>> On 24/10/21 1:10 am, Jacob Faibussowitsch wrote:
>>> Depending on the use-case you may also find PetscObjectRegisterDestroy() useful. If you can’t guarantee your PetscObjectDestroy() calls are collective, but have some other collective section you may call it then to punt the destruction of your object to PetscFinalize() which is guaranteed to be collective.
>>>
>>> https://petsc.org/main/docs/manualpages/Sys/PetscObjectRegisterDestroy.html
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>>
>>>> On Oct 22, 2021, at 23:33, Jed Brown <jed at jedbrown.org> wrote:
>>>>
>>>> Junchao Zhang <junchao.zhang at gmail.com> writes:
>>>>
>>>>> On Fri, Oct 22, 2021 at 9:13 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>
>>>>>>
>>>>>> One technical reason is that PetscHeaderDestroy_Private() may call
>>>>>> PetscCommDestroy() which may call MPI_Comm_free() which is defined by the
>>>>>> standard to be collective. Though PETSc tries to limit its use of new MPI
>>>>>> communicators (for example generally many objects shared the same
>>>>>> communicator) if we did not free those we no longer need when destroying
>>>>>> objects we could run out.
>>>>>>
>>>>> PetscCommDestroy() might call MPI_Comm_free() , but it is very unlikely.
>>>>> Petsc uses reference counting on communicators, so in PetscCommDestroy(),
>>>>> it likely just decreases the count. In other words, PetscCommDestroy() is
>>>>> cheap and in effect not collective.
>>>>
>>>> Unless it's the last reference to a given communicator, which is a risky/difficult thing for a user to guarantee and the consequences are potentially dire (deadlock being way worse than a crash) when the user's intent is to relax ordering for destruction.
>>>>
>>>> Alberto, what is the use case in which deterministic destruction is problematic? If you relax it for individual objects, is there a place you can be collective to collect any stale communicators?
>>>
>> --
>> Alberto F. Martín-Huertas
>> Senior Researcher, PhD. Computational Science
>> Centre Internacional de Mètodes Numèrics a l'Enginyeria (CIMNE)
>> Parc Mediterrani de la Tecnologia, UPC
>>
>> Esteve Terradas 5, Building C3, Office 215
>> ,
>> 08860 Castelldefels (Barcelona, Spain)
>> Tel.: (+34) 9341 34223
>>
>> e-mail:amartin at cimne.upc.edu
>>
>>
>> FEMPAR project co-founder
>> web:
>> http://www.fempar.org
>>
>>
>> **********************
>> IMPORTANT ANNOUNCEMENT
>>
>> The information contained in this message and / or attached file (s), sent from CENTRO INTERNACIONAL DE METODES NUMERICS EN ENGINYERIA-CIMNE,
>> is confidential / privileged and is intended to be read only by the person (s) to the one (s) that is directed. Your data has been incorporated
>> into the treatment system of CENTRO INTERNACIONAL DE METODES NUMERICS EN ENGINYERIA-CIMNE by virtue of its status as client, user of the website,
>> provider and / or collaborator in order to contact you and send you information that may be of your interest and resolve your queries.
>> You can exercise your rights of access, rectification, limitation of treatment, deletion, and opposition / revocation, in the terms established
>> by the current regulations on data protection, directing your request to the postal address C / Gran Capitá, s / n Building C1 - 2nd Floor -
>> Office C15 -Campus Nord - UPC 08034 Barcelona or via email to
>> dpo at cimne.upc.edu
>>
>>
>> If you read this message and it is not the designated recipient, or you have received this communication in error, we inform you that it is
>> totally prohibited, and may be illegal, any disclosure, distribution or reproduction of this communication, and please notify us immediately.
>> and return the original message to the address mentioned above.
>>
>
>
>
> --
> Stefano
More information about the petsc-users
mailing list