[petsc-dev] Registration implicitly collective on COMM_WORLD

Barry Smith bsmith at mcs.anl.gov
Mon Feb 4 22:05:21 CST 2013


   This is currently a mess. 

   Say one process calls PetscFunctionListAdd() with a function pointer, but another calls it with the string name of the function. Now both processes call PetscFunctionListFind() with a common comm. The process with the function pointer will return immediately with the answer. The one without the function pointer will start mucking around with dynamic libraries which "sometimes" could be collective on the comm so it would block? 

  These sets of routines evolved organically overtime. We need to refactor the whole hierarchy of these routines and figure out what collectivity is needed.  There are too many potential comms since they were kind of shoved in over time. 

  It may be simplest if we treat accessing the dynamic libraries as completely non-collective, this means removing things like PetscDLLibraryRetrieve() which, while a way cool concept has never proven to be practical during its 15 years of existence.

   So are we able to treat accessing dynamic libraries as completely non-collective? Will this lose a valuable feature?

   Barry


On Feb 4, 2013, at 9:22 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> On Sat, Feb 2, 2013 at 3:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    Yeah I noticed this problem but didn't want to deal with it when I changed the code.
> 
> So if we believe the documentation of PetscFunctionListAdd, XXInitializePackage() is effectively collective on COMM_WORLD (though not documented as such). This means that if !defined(PETSC_USE_DYNAMIC_LIBRARIES), the following could deadlock:
> 
> if (!rank) {
>   VecCreate(PETSC_COMM_SELF,....);
> }
> 
> which would be awfully bad behavior. In reality, PetscFunctionListAdd() does not reference comm at all. Why did you add the comm argument? "Consistency"?
> 
> Whatever the "next" documentation system is, it should be taught to trace the "collective" attribute and complain if a "Not Collective" function calls a Collective function with an argument other than COMM_SELF.
> 
> 
>     Yes we should remove the "Formally Collective", I was drinking that week :-)
> 
>    Barry
> 
> On Feb 2, 2013, at 2:54 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> 
> > In [1], PetscFunctionListAdd became implicitly collective on COMM_WORLD, but the all the XXRegisterDynamic() say "Not collective". These all have to be updated if this is the case, but I'm not sure it's even a good thing. What if we have a big multi-domain simulation in which we initialize each of the components on their own subcomm. Those sub-components would not be allowed to register methods (or load plugins) that they might use because registration was implicitly more global.
> >
> > The comm is used by PetscLs and others. This is important because file systems are terrible at independent access. (Same for loading shared libraries; it's potentially much easier to do it by broadcasting the library, though portability is tricky.)
> >
> > Anyway, it would be really bad to PetscDLLibraryAppend() on a subcomm and have the registration function in the shared lib call PCRegisterDynamic() that promotes itself to COMM_WORLD.
> >
> > Maybe we need to pass an explicit comm to all the registration functions.
> >
> > [1] https://bitbucket.org/petsc/petsc-dev/commits/07f9e01e040feeb4162253a60ca63556436f4135
> >
> > What does "Formally collective" mean anyway? Either it's always safe to call independently, it's "Logically collective" so that there is no performance impact, but it still needs to be collective to have consistent state, or it's Not Collective. This falls under Not Collective because it can deadlock if you call it independently.
> 
> 




More information about the petsc-dev mailing list