[petsc-dev] Registration implicitly collective on COMM_WORLD

Barry Smith bsmith at mcs.anl.gov
Mon Feb 4 23:01:09 CST 2013


  Ok, one decision made. Dynamic Library loads are collective. 

   Now we need to make several more decisions. 

1) are PETSc package registrations collective? sys, vec, mat, dm, ksp, snes, ts
     currently as Jed noted they are not except with dynamic loading of PETSc libs

2) are all PETSc packages registered upfront during PetscInitialize()? 
    currently only with dynamic loading of petsc libs 
2a) If all PETSc packages are registered up front what is the mechanism to turn off registering some? Thought Matt disagrees there is always some asshole who says, I don't use TS so I don't want it registered.

…..

Including where the comms are passed around for all the methods.

  Barry


On Feb 4, 2013, at 10:53 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Feb 4, 2013 at 11:40 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> On Mon, Feb 4, 2013 at 10:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    This is currently a mess.
> 
>    Say one process calls PetscFunctionListAdd() with a function pointer, but another calls it with the string name of the function. Now both processes call PetscFunctionListFind() with a common comm. The process with the function pointer will return immediately with the answer. The one without the function pointer will start mucking around with dynamic libraries which "sometimes" could be collective on the comm so it would block?
> 
>   These sets of routines evolved organically overtime. We need to refactor the whole hierarchy of these routines and figure out what collectivity is needed.  There are too many potential comms since they were kind of shoved in over time.
> 
>   It may be simplest if we treat accessing the dynamic libraries as completely non-collective, this means removing things like PetscDLLibraryRetrieve() which, while a way cool concept has never proven to be practical during its 15 years of existence.
> 
>    So are we able to treat accessing dynamic libraries as completely non-collective? Will this lose a valuable feature?
> 
> Sort of. The problem is that independent access to the file system is already so slow on current hardware that shared libraries bring those expensive machines to their knees. When we worked on this problem for Python (which is _heavily_ dependent on dynamical loading), we patched glibc-rtld so we could get hooks into a library I called "collfs" (collective file system) that would do a collective open implemented using MPI_Bcast. Most of this circus could go away if libc provided "dlopenfd()", in which case we could use shm_open() and avoid touching the file system at all.
> 
> From the glibc implementation, I don't think anyone was trying to make adding dlopenfd() easy to implement, so we probably have to deal with paths. Still, if we have a working shm_open and a communicator, we can avoid the libc-rtld hocus pocus with a fast collective load implemented as:
> 
>   rank 0 mmaps the file
>   everyone else does shm_open and mmap
>   MPI_Bcast
>   dlopen("/dev/shm/thelib.so",)
> 
> In summary, I think collective loads are useful even without the "retrieval" stuff.
> 
> I agree with Jed here. We should keep the collective semantics for loading.
>  
> Now it's cleaner for modularity to load the entire plugin library up-front and let PetscDLLibraryRegister_thelib call MatRegister for everything that it provides. It's easy to manage collectivity this way, but unfortunately, it eats up startup time and memory. (PETSc's current dynamic registration is like Emacs "autoload".)
> 
> I think this was a hack to begin with (and I did it), so calling the library loads up front does not bother me. I don't
> think anyone in the world is interested an a lightweight, partial PETSc.
> 
>    Matt
>  
> At a cost of at least one reduction per library per communicator, we could keep track of the scope on which each library has been loaded so that all loads are safe. Of course performance would go way down if many callers brought in the library on a small object, but that may be unavoidable.
> 
> 
>    Barry
> 
> 
> On Feb 4, 2013, at 9:22 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> 
> > On Sat, Feb 2, 2013 at 3:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >    Yeah I noticed this problem but didn't want to deal with it when I changed the code.
> >
> > So if we believe the documentation of PetscFunctionListAdd, XXInitializePackage() is effectively collective on COMM_WORLD (though not documented as such). This means that if !defined(PETSC_USE_DYNAMIC_LIBRARIES), the following could deadlock:
> >
> > if (!rank) {
> >   VecCreate(PETSC_COMM_SELF,....);
> > }
> >
> > which would be awfully bad behavior. In reality, PetscFunctionListAdd() does not reference comm at all. Why did you add the comm argument? "Consistency"?
> >
> > Whatever the "next" documentation system is, it should be taught to trace the "collective" attribute and complain if a "Not Collective" function calls a Collective function with an argument other than COMM_SELF.
> >
> >
> >     Yes we should remove the "Formally Collective", I was drinking that week :-)
> >
> >    Barry
> >
> > On Feb 2, 2013, at 2:54 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> >
> > > In [1], PetscFunctionListAdd became implicitly collective on COMM_WORLD, but the all the XXRegisterDynamic() say "Not collective". These all have to be updated if this is the case, but I'm not sure it's even a good thing. What if we have a big multi-domain simulation in which we initialize each of the components on their own subcomm. Those sub-components would not be allowed to register methods (or load plugins) that they might use because registration was implicitly more global.
> > >
> > > The comm is used by PetscLs and others. This is important because file systems are terrible at independent access. (Same for loading shared libraries; it's potentially much easier to do it by broadcasting the library, though portability is tricky.)
> > >
> > > Anyway, it would be really bad to PetscDLLibraryAppend() on a subcomm and have the registration function in the shared lib call PCRegisterDynamic() that promotes itself to COMM_WORLD.
> > >
> > > Maybe we need to pass an explicit comm to all the registration functions.
> > >
> > > [1] https://bitbucket.org/petsc/petsc-dev/commits/07f9e01e040feeb4162253a60ca63556436f4135
> > >
> > > What does "Formally collective" mean anyway? Either it's always safe to call independently, it's "Logically collective" so that there is no performance impact, but it still needs to be collective to have consistent state, or it's Not Collective. This falls under Not Collective because it can deadlock if you call it independently.
> >
> >
> 
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener




More information about the petsc-dev mailing list