[petsc-users] Why PetscDestroy global collective semantics?
Alberto F. Martín
amartin at cimne.upc.edu
Tue Oct 26 06:48:58 CDT 2021
Thanks all for this second round of detailed responses. Highly appreciated!
I think that I have enough material to continue exploring a solution in
our particular context.
Best regards,
Alberto.
On 25/10/21 11:12 pm, Betteridge, Jack D wrote:
> Hi Everyone,
>
> I cannot fault Lawrence's explanation, that is precisely what I'm
> implementing. The only difference is I was adding most of the logic
> for the "resurrected objects map" to petsc4py rather than PETSc. Given
> that this solution is truly Python agnostic, I will move what I have
> written to C and merely add the interface to the functionality to
> petsc4py.
>
> Indeed, this works out better for me as I was not enjoying writing all
> the code in Cython! I'll post an update once there is a working
> prototype in my PETSc fork, and the code is ready for testing.
>
> Cheers,
> Jack
>
>
> ------------------------------------------------------------------------
> *From:* Lawrence Mitchell <wence at gmx.li>
> *Sent:* 25 October 2021 12:34
> *To:* Stefano Zampini <stefano.zampini at gmail.com>
> *Cc:* Barry Smith <bsmith at petsc.dev>; "Alberto F. Martín"
> <amartin at cimne.upc.edu>; PETSc users list <petsc-users at mcs.anl.gov>;
> Francesc Verdugo <fverdugo at cimne.upc.edu>; Betteridge, Jack D
> <j.betteridge at imperial.ac.uk>
> *Subject:* Re: [petsc-users] Why PetscDestroy global collective
> semantics?
>
> *******************
> This email originates from outside Imperial. Do not click on links and
> attachments unless you recognise the sender.
> If you trust the sender, add them to your safe senders list
> https://spam.ic.ac.uk/SpamConsole/Senders.aspx
> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email
> stamping for this address.
> *******************
> Hi all,
>
> (I cc Jack who is doing the implementation in the petsc4py setting)
>
> > On 24 Oct 2021, at 06:51, Stefano Zampini
> <stefano.zampini at gmail.com> wrote:
> >
> > Non-deterministic garbage collection is an issue from Python too,
> and firedrake folks are also working on that.
> >
> > We may consider deferring all calls to MPI_Comm_free done on
> communicators with 1 as ref count (i.e., the call will actually wipe
> out some internal MPI data) in a collective call that can be either
> run by the user (on PETSC_COMM_WORLD), or at PetscFinalize() stage.
> > I.e., something like that
> >
> > #define MPI_Comm_free(comm) PutCommInAList(comm)
> >
> > Comm creation is collective by definition, and thus collectiveness
> of the order of the destruction can be easily enforced.
> > I don't see problems with 3rd party libraries using comms, since we
> always duplicate the comm we passed them
>
> > Lawrence, do you think this may help you?
>
> I think that it is not just MPI_Comm_free that is potentially problematic.
>
> Here are some additional areas off the top of my head:
>
> 1. PetscSF with -sf_type window. Destroy (when the refcount drops to
> zero) calls MPI_Win_free (which is collective over comm)
> 2. Deallocation of MUMPS objects is tremendously collective.
>
> In general the solution of just punting MPI_Comm_free to PetscFinalize
> (or some user-defined time) is, I think, insufficient since it
> requires us to audit the collectiveness of all `XXX_Destroy` functions
> (including in third-party packages).
>
> Barry's suggestion of resurrecting objects in finalisation using
> PetscObjectRegisterDestroy and then collectively clearing that array
> periodically is pretty close to the proposal that we cooked up I think.
>
> Jack can correct any missteps I make in explanation, but perhaps this
> is helpful for Alberto:
>
> 1. Each PETSc communicator gets two new attributes "creation_index"
> [an int64], "resurrected_objects" [a set-like thing]
> 2. PetscHeaderCreate grabs the next creation_index out of the input
> communicator and stashes it on the object. Since object creation is
> collective this is guaranteed to agree on any given communicator
> across processes.
> 3. When the Python garbage collector tries to destroy PETSc objects we
> resurrect the _C_ object in finalisation and stash it in
> "resurrected_objects" on the communicator.
> 4. Periodically (as a result of user intervention in the first
> instance), we do garbage collection collectively on these resurrected
> objects by performing a set intersection of the creation_indices
> across the communicator's processes, and then calling XXXDestroy in
> order on the sorted_by_creation_index set intersection.
>
>
> I think that most of this infrastructure is agnostic of the managed
> language, so Jack was doing implementation in PETSc (rather than
> petsc4py).
>
> This wasn't a perfect solution (I recall that we could still cook up
> situations in which objects would not be collected), but it did seem
> to (in theory) solve any potential deadlock issues.
>
> Lawrence
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211026/1ab5849f/attachment.html>
More information about the petsc-users
mailing list