[petsc-users] Why PetscDestroy global collective semantics?
Stefano Zampini
stefano.zampini at gmail.com
Sun Oct 24 00:51:51 CDT 2021
Non-deterministic garbage collection is an issue from Python too, and
firedrake folks are also working on that.
We may consider deferring all calls to MPI_Comm_free done on communicators
with 1 as ref count (i.e., the call will actually wipe out some internal
MPI data) in a collective call that can be either run by the user (on
PETSC_COMM_WORLD), or at PetscFinalize() stage.
I.e., something like that
#define MPI_Comm_free(comm) PutCommInAList(comm)
Comm creation is collective by definition, and thus collectiveness of the
order of the destruction can be easily enforced.
I don't see problems with 3rd party libraries using comms, since we always
duplicate the comm we passed them
Lawrence, do you think this may help you?
Thanks
Stefano
Il giorno dom 24 ott 2021 alle ore 05:58 Barry Smith <bsmith at petsc.dev> ha
scritto:
>
> Ahh, this makes perfect sense.
>
> The code for PetscObjectRegisterDestroy() and the actual destruction
> (called in PetscFinalize()) is very simply and can be found in
> src/sys/objects/destroy.c PetscObjectRegisterDestroy(), PetscObjectRegisterDestroyAll().
>
> You could easily maintain a new array
> like PetscObjectRegisterGCDestroy_Objects[] and add objects
> with PetscObjectRegisterGCDestroy() and then destroy them
> with PetscObjectRegisterDestroyGCAll(). The only tricky part is that you
> have to have, in the context of your Julia MPI, make sure
> that PetscObjectRegisterDestroyGCAll() is called collectively over all the
> MPI ranks (that is it has to be called where all the ranks have made the
> same progress on MPI communication) that have registered objects to
> destroy, generally PETSC_COMM_ALL. We would be happy to incorporate such a
> system into the PETSc source with a merge request.
>
> Barry
>
> On Oct 23, 2021, at 10:40 PM, Alberto F. Martín <amartin at cimne.upc.edu>
> wrote:
>
> Thanks all for your very insightful answers.
>
> We are leveraging PETSc from Julia in a parallel distributed memory
> context (several MPI tasks running the Julia REPL each).
>
> Julia uses Garbage Collection (GC), and we would like to destroy the PETSc
> objects automatically when the GC decides so along the simulation.
>
> In this context, we cannot guarantee deterministic destruction on all MPI
> tasks as the GC decisions are local to each task, no global semantics
> guaranteed.
>
> As far as I understand from your answers, there seems to be the
> possibility to defer the destruction of objects till points in the parallel
> program in which you can guarantee collective semantics, correct? If yes I
> guess that this may occur at any point in the simulation, not necessarily
> at shut down via PetscFinalize(), right?
>
> Best regards,
>
> Alberto.
>
>
> On 24/10/21 1:10 am, Jacob Faibussowitsch wrote:
>
> Depending on the use-case you may also find PetscObjectRegisterDestroy()
> useful. If you can’t guarantee your PetscObjectDestroy() calls are
> collective, but have some other collective section you may call it then to
> punt the destruction of your object to PetscFinalize() which is guaranteed
> to be collective.
>
> https://petsc.org/main/docs/manualpages/Sys/PetscObjectRegisterDestroy.html
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Oct 22, 2021, at 23:33, Jed Brown <jed at jedbrown.org> wrote:
>
> Junchao Zhang <junchao.zhang at gmail.com> writes:
>
> On Fri, Oct 22, 2021 at 9:13 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>
> One technical reason is that PetscHeaderDestroy_Private() may call
> PetscCommDestroy() which may call MPI_Comm_free() which is defined by the
> standard to be collective. Though PETSc tries to limit its use of new MPI
> communicators (for example generally many objects shared the same
> communicator) if we did not free those we no longer need when destroying
> objects we could run out.
>
> PetscCommDestroy() might call MPI_Comm_free() , but it is very unlikely.
> Petsc uses reference counting on communicators, so in PetscCommDestroy(),
> it likely just decreases the count. In other words, PetscCommDestroy() is
> cheap and in effect not collective.
>
>
> Unless it's the last reference to a given communicator, which is a
> risky/difficult thing for a user to guarantee and the consequences are
> potentially dire (deadlock being way worse than a crash) when the user's
> intent is to relax ordering for destruction.
>
> Alberto, what is the use case in which deterministic destruction is
> problematic? If you relax it for individual objects, is there a place you
> can be collective to collect any stale communicators?
>
>
> --
> Alberto F. Martín-Huertas
> Senior Researcher, PhD. Computational Science
> Centre Internacional de Mètodes Numèrics a l'Enginyeria (CIMNE)
> Parc Mediterrani de la Tecnologia, UPC
> Esteve Terradas 5, Building C3, Office 215,
> 08860 Castelldefels (Barcelona, Spain)
> Tel.: (+34) 9341 34223e-mail:amartin at cimne.upc.edu
>
> FEMPAR project co-founder
> web: http://www.fempar.org
>
> **********************
> IMPORTANT ANNOUNCEMENT
>
> The information contained in this message and / or attached file (s), sent from CENTRO INTERNACIONAL DE METODES NUMERICS EN ENGINYERIA-CIMNE,
> is confidential / privileged and is intended to be read only by the person (s) to the one (s) that is directed. Your data has been incorporated
> into the treatment system of CENTRO INTERNACIONAL DE METODES NUMERICS EN ENGINYERIA-CIMNE by virtue of its status as client, user of the website,
> provider and / or collaborator in order to contact you and send you information that may be of your interest and resolve your queries.
> You can exercise your rights of access, rectification, limitation of treatment, deletion, and opposition / revocation, in the terms established
> by the current regulations on data protection, directing your request to the postal address C / Gran Capitá, s / n Building C1 - 2nd Floor -
> Office C15 -Campus Nord - UPC 08034 Barcelona or via email to dpo at cimne.upc.edu
>
> If you read this message and it is not the designated recipient, or you have received this communication in error, we inform you that it is
> totally prohibited, and may be illegal, any disclosure, distribution or reproduction of this communication, and please notify us immediately.
> and return the original message to the address mentioned above.
>
>
>
--
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211024/566cfcb2/attachment.html>
More information about the petsc-users
mailing list