[petsc-dev] Registration implicitly collective on COMM_WORLD

Matthew Knepley knepley at gmail.com
Mon Feb 4 22:53:25 CST 2013


On Mon, Feb 4, 2013 at 11:40 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> On Mon, Feb 4, 2013 at 10:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>    This is currently a mess.
>>
>>    Say one process calls PetscFunctionListAdd() with a function pointer,
>> but another calls it with the string name of the function. Now both
>> processes call PetscFunctionListFind() with a common comm. The process with
>> the function pointer will return immediately with the answer. The one
>> without the function pointer will start mucking around with dynamic
>> libraries which "sometimes" could be collective on the comm so it would
>> block?
>>
>>   These sets of routines evolved organically overtime. We need to
>> refactor the whole hierarchy of these routines and figure out what
>> collectivity is needed.  There are too many potential comms since they were
>> kind of shoved in over time.
>>
>>   It may be simplest if we treat accessing the dynamic libraries as
>> completely non-collective, this means removing things like
>> PetscDLLibraryRetrieve() which, while a way cool concept has never proven
>> to be practical during its 15 years of existence.
>>
>>    So are we able to treat accessing dynamic libraries as completely
>> non-collective? Will this lose a valuable feature?
>>
>
> Sort of. The problem is that independent access to the file system is
> already so slow on current hardware that shared libraries bring those
> expensive machines to their knees. When we worked on this problem for
> Python (which is _heavily_ dependent on dynamical loading), we patched
> glibc-rtld so we could get hooks into a library I called "collfs"
> (collective file system) that would do a collective open implemented using
> MPI_Bcast. Most of this circus could go away if libc provided "dlopenfd()",
> in which case we could use shm_open() and avoid touching the file system at
> all.
>
> From the glibc implementation, I don't think anyone was trying to make
> adding dlopenfd() easy to implement, so we probably have to deal with
> paths. Still, if we have a working shm_open and a communicator, we can
> avoid the libc-rtld hocus pocus with a fast collective load implemented as:
>
>   rank 0 mmaps the file
>   everyone else does shm_open and mmap
>   MPI_Bcast
>   dlopen("/dev/shm/thelib.so",)
>
> In summary, I think collective loads are useful even without the
> "retrieval" stuff.
>

I agree with Jed here. We should keep the collective semantics for loading.


> Now it's cleaner for modularity to load the entire plugin library up-front
> and let PetscDLLibraryRegister_thelib call MatRegister for everything that
> it provides. It's easy to manage collectivity this way, but unfortunately,
> it eats up startup time and memory. (PETSc's current dynamic registration
> is like Emacs "autoload".)
>

I think this was a hack to begin with (and I did it), so calling the
library loads up front does not bother me. I don't
think anyone in the world is interested an a lightweight, partial PETSc.

   Matt


> At a cost of at least one reduction per library per communicator, we could
> keep track of the scope on which each library has been loaded so that all
> loads are safe. Of course performance would go way down if many callers
> brought in the library on a small object, but that may be unavoidable.
>
>
>>    Barry
>>
>>
>> On Feb 4, 2013, at 9:22 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>>
>> > On Sat, Feb 2, 2013 at 3:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >
>> >    Yeah I noticed this problem but didn't want to deal with it when I
>> changed the code.
>> >
>> > So if we believe the documentation of PetscFunctionListAdd,
>> XXInitializePackage() is effectively collective on COMM_WORLD (though not
>> documented as such). This means that if
>> !defined(PETSC_USE_DYNAMIC_LIBRARIES), the following could deadlock:
>> >
>> > if (!rank) {
>> >   VecCreate(PETSC_COMM_SELF,....);
>> > }
>> >
>> > which would be awfully bad behavior. In reality, PetscFunctionListAdd()
>> does not reference comm at all. Why did you add the comm argument?
>> "Consistency"?
>> >
>> > Whatever the "next" documentation system is, it should be taught to
>> trace the "collective" attribute and complain if a "Not Collective"
>> function calls a Collective function with an argument other than COMM_SELF.
>> >
>> >
>> >     Yes we should remove the "Formally Collective", I was drinking that
>> week :-)
>> >
>> >    Barry
>> >
>> > On Feb 2, 2013, at 2:54 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>> >
>> > > In [1], PetscFunctionListAdd became implicitly collective on
>> COMM_WORLD, but the all the XXRegisterDynamic() say "Not collective". These
>> all have to be updated if this is the case, but I'm not sure it's even a
>> good thing. What if we have a big multi-domain simulation in which we
>> initialize each of the components on their own subcomm. Those
>> sub-components would not be allowed to register methods (or load plugins)
>> that they might use because registration was implicitly more global.
>> > >
>> > > The comm is used by PetscLs and others. This is important because
>> file systems are terrible at independent access. (Same for loading shared
>> libraries; it's potentially much easier to do it by broadcasting the
>> library, though portability is tricky.)
>> > >
>> > > Anyway, it would be really bad to PetscDLLibraryAppend() on a subcomm
>> and have the registration function in the shared lib call
>> PCRegisterDynamic() that promotes itself to COMM_WORLD.
>> > >
>> > > Maybe we need to pass an explicit comm to all the registration
>> functions.
>> > >
>> > > [1]
>> https://bitbucket.org/petsc/petsc-dev/commits/07f9e01e040feeb4162253a60ca63556436f4135
>> > >
>> > > What does "Formally collective" mean anyway? Either it's always safe
>> to call independently, it's "Logically collective" so that there is no
>> performance impact, but it still needs to be collective to have consistent
>> state, or it's Not Collective. This falls under Not Collective because it
>> can deadlock if you call it independently.
>> >
>> >
>>
>>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130204/04bd46a2/attachment.html>


More information about the petsc-dev mailing list