[petsc-dev] Registration implicitly collective on COMM_WORLD

Matthew Knepley knepley at gmail.com
Mon Feb 4 23:08:24 CST 2013


On Tue, Feb 5, 2013 at 12:01 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Ok, one decision made. Dynamic Library loads are collective.
>
>    Now we need to make several more decisions.
>
> 1) are PETSc package registrations collective? sys, vec, mat, dm, ksp,
> snes, ts
>      currently as Jed noted they are not except with dynamic loading of
> PETSc libs
>

I vote yes. Its much easier, and I see no real upside to being more
flexible.


> 2) are all PETSc packages registered upfront during PetscInitialize()?
>     currently only with dynamic loading of petsc libs
>

Yes again. I think mirroring the dynamic way makes things easier too.


> 2a) If all PETSc packages are registered up front what is the mechanism to
> turn off registering some? Thought Matt disagrees there is always some
> asshole who says, I don't use TS so I don't want it registered.
>

Tell them its not loaded and see if they can figure out that it is.

  Matt


> …..
>
> Including where the comms are passed around for all the methods.
>
>   Barry
>
>
> On Feb 4, 2013, at 10:53 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> > On Mon, Feb 4, 2013 at 11:40 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> > On Mon, Feb 4, 2013 at 10:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >    This is currently a mess.
> >
> >    Say one process calls PetscFunctionListAdd() with a function pointer,
> but another calls it with the string name of the function. Now both
> processes call PetscFunctionListFind() with a common comm. The process with
> the function pointer will return immediately with the answer. The one
> without the function pointer will start mucking around with dynamic
> libraries which "sometimes" could be collective on the comm so it would
> block?
> >
> >   These sets of routines evolved organically overtime. We need to
> refactor the whole hierarchy of these routines and figure out what
> collectivity is needed.  There are too many potential comms since they were
> kind of shoved in over time.
> >
> >   It may be simplest if we treat accessing the dynamic libraries as
> completely non-collective, this means removing things like
> PetscDLLibraryRetrieve() which, while a way cool concept has never proven
> to be practical during its 15 years of existence.
> >
> >    So are we able to treat accessing dynamic libraries as completely
> non-collective? Will this lose a valuable feature?
> >
> > Sort of. The problem is that independent access to the file system is
> already so slow on current hardware that shared libraries bring those
> expensive machines to their knees. When we worked on this problem for
> Python (which is _heavily_ dependent on dynamical loading), we patched
> glibc-rtld so we could get hooks into a library I called "collfs"
> (collective file system) that would do a collective open implemented using
> MPI_Bcast. Most of this circus could go away if libc provided "dlopenfd()",
> in which case we could use shm_open() and avoid touching the file system at
> all.
> >
> > From the glibc implementation, I don't think anyone was trying to make
> adding dlopenfd() easy to implement, so we probably have to deal with
> paths. Still, if we have a working shm_open and a communicator, we can
> avoid the libc-rtld hocus pocus with a fast collective load implemented as:
> >
> >   rank 0 mmaps the file
> >   everyone else does shm_open and mmap
> >   MPI_Bcast
> >   dlopen("/dev/shm/thelib.so",)
> >
> > In summary, I think collective loads are useful even without the
> "retrieval" stuff.
> >
> > I agree with Jed here. We should keep the collective semantics for
> loading.
> >
> > Now it's cleaner for modularity to load the entire plugin library
> up-front and let PetscDLLibraryRegister_thelib call MatRegister for
> everything that it provides. It's easy to manage collectivity this way, but
> unfortunately, it eats up startup time and memory. (PETSc's current dynamic
> registration is like Emacs "autoload".)
> >
> > I think this was a hack to begin with (and I did it), so calling the
> library loads up front does not bother me. I don't
> > think anyone in the world is interested an a lightweight, partial PETSc.
> >
> >    Matt
> >
> > At a cost of at least one reduction per library per communicator, we
> could keep track of the scope on which each library has been loaded so that
> all loads are safe. Of course performance would go way down if many callers
> brought in the library on a small object, but that may be unavoidable.
> >
> >
> >    Barry
> >
> >
> > On Feb 4, 2013, at 9:22 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> >
> > > On Sat, Feb 2, 2013 at 3:30 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > >
> > >    Yeah I noticed this problem but didn't want to deal with it when I
> changed the code.
> > >
> > > So if we believe the documentation of PetscFunctionListAdd,
> XXInitializePackage() is effectively collective on COMM_WORLD (though not
> documented as such). This means that if
> !defined(PETSC_USE_DYNAMIC_LIBRARIES), the following could deadlock:
> > >
> > > if (!rank) {
> > >   VecCreate(PETSC_COMM_SELF,....);
> > > }
> > >
> > > which would be awfully bad behavior. In reality,
> PetscFunctionListAdd() does not reference comm at all. Why did you add the
> comm argument? "Consistency"?
> > >
> > > Whatever the "next" documentation system is, it should be taught to
> trace the "collective" attribute and complain if a "Not Collective"
> function calls a Collective function with an argument other than COMM_SELF.
> > >
> > >
> > >     Yes we should remove the "Formally Collective", I was drinking
> that week :-)
> > >
> > >    Barry
> > >
> > > On Feb 2, 2013, at 2:54 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> > >
> > > > In [1], PetscFunctionListAdd became implicitly collective on
> COMM_WORLD, but the all the XXRegisterDynamic() say "Not collective". These
> all have to be updated if this is the case, but I'm not sure it's even a
> good thing. What if we have a big multi-domain simulation in which we
> initialize each of the components on their own subcomm. Those
> sub-components would not be allowed to register methods (or load plugins)
> that they might use because registration was implicitly more global.
> > > >
> > > > The comm is used by PetscLs and others. This is important because
> file systems are terrible at independent access. (Same for loading shared
> libraries; it's potentially much easier to do it by broadcasting the
> library, though portability is tricky.)
> > > >
> > > > Anyway, it would be really bad to PetscDLLibraryAppend() on a
> subcomm and have the registration function in the shared lib call
> PCRegisterDynamic() that promotes itself to COMM_WORLD.
> > > >
> > > > Maybe we need to pass an explicit comm to all the registration
> functions.
> > > >
> > > > [1]
> https://bitbucket.org/petsc/petsc-dev/commits/07f9e01e040feeb4162253a60ca63556436f4135
> > > >
> > > > What does "Formally collective" mean anyway? Either it's always safe
> to call independently, it's "Logically collective" so that there is no
> performance impact, but it still needs to be collective to have consistent
> state, or it's Not Collective. This falls under Not Collective because it
> can deadlock if you call it independently.
> > >
> > >
> >
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130205/4fcd850e/attachment.html>


More information about the petsc-dev mailing list