[petsc-dev] Using multiple mallocs with PETSc

Richard Mills richardtmills at gmail.com
Fri Mar 17 18:23:44 CDT 2017


On Tue, Mar 14, 2017 at 4:59 PM, Jed Brown <jed at jedbrown.org> wrote:

> Richard Mills <richardtmills at gmail.com> writes:
>
> > On Tue, Mar 14, 2017 at 2:18 PM, Jed Brown <jed at jedbrown.org> wrote:
> >
> >> Richard Mills <richardtmills at gmail.com> writes:
> >>
> >> > On Tue, Mar 14, 2017 at 1:23 PM, Jed Brown <jed at jedbrown.org> wrote:
> >> >
> >> >> Barry Smith <bsmith at mcs.anl.gov> writes:
> >> >>
> >> >> >> On Mar 13, 2017, at 1:27 PM, Jed Brown <jed at jedbrown.org> wrote:
> >> >> >>
> >> >> >> Satish Balay <balay at mcs.anl.gov> writes:
> >> >> >>> stash the metadata for each allocation (and pointers for
> >> corresponding
> >> >> >>> free) in a hash table for all mallocs that we need to track?
> [this
> >> >> >>> avoids the wasted 'space' in each alloc.]
> >> >> >>
> >> >> >> Sure, but this is just duplicating an implementation of malloc.
> >> >> >
> >> >> >    No it isn't. It is a very thin wrapper around multiple current
> >> >> mallocs.
> >> >>
> >> >> Meh, the proposal has more storage overhead than malloc().
> >> >>
> >> >
> >> > I was bored or something, so I actually looked into how people who
> want
> >> to
> >> > track all the allocations inside a special malloc() do so, and it
> seems
> >> > that plenty of people use a red-black tree for this (balanced binary
> >> tree,
> >> > O(log(n) for search, insert/delete, and tree rearrangement) rather
> than a
> >> > hash table.  This is getting pretty far down in the weeds... but this
> >> would
> >> > have less storage overhead than a hash table.  Just FYI. =)
> >>
> >> Tcmalloc has an overhead of 1% for common usage patterns when allocating
> >> 8-byte objects.  A tree is much higher overhead.
> >>
> >
> > But I'm not talking about doing the same thing as a malloc
> implementation:
> > we aren't trying to do things like keep track of the "free" list, just
> what
> > free() to use for a given allocation.
>
> Yes, but my point is that the much simpler thing you're doing is
> actually much higher overhead than a full malloc implementation.  It's
> as though you're running an expensive job on a supercomputer and I tell
> you my phone does the same thing in real time and you reject my
> criticism to say that my phone is actually doing something more
> difficult.  ;-)
>
> > And to keep this overhead down, we might consider having something
> > like a normal PetscMalloc() and a PetscMallocNumeric() that is just
> > used to allocate arrays for things like vectors and matrices -- the
> > things that we think might be bandwidth critical and may need to
> > support different allocators or be considered for migration between
> > memory types.  There aren't going to be tons of objects you'd
> > allocated with PetscMallocNumeric, so the storing all these addresses
> > and a corresponding free() should have very little overhead.
>
> Yes, my objection is specifically with regard to incurring this overhead
> for small allocations.  I care about overhead because of small objects,
> such as might appear in a linked list or tree.  For large objects, you
> could just as well skip malloc and call mmap() directly -- that's what
> malloc() is doing.  For intermediate sizes (multiple pages, but less
> than MMAP_THRESHOLD) in the presence of threads, one could argue that
> calling mmap() directly is preferred because malloc() gives you memory
> that is already faulted and thus first-touch won't do the right thing.
>

Yes, things like linked lists and tree were my concern as well.  I think
the best way to deal with this is to have two different versions of
PetscMalloc, as discussed above.  The "normal" PetscMalloc should on be
something that follows the current guidance in the manual page: "This
routine MUST be called before PetscInitialize
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html#PetscInitialize>()
and may be called only once."  I propose that the other routine -- maybe
it's called PetscMallocNumeric() or PetscMallocLarge() or whatever --
should support the user swapping out the underlying allocator by keeping a
hash table or balanced search tree to track the appropriate free() to use.
I think it would be nice to do things analogously to the PETSc logging
stages and have PetscMallocNumericRegister() and
PetscMallocNumericPush()/Pop().  (Hopefully some day we just have something
"smart" that automagically moves objects around in memory as appropriate,
but that doesn't exist now.  I think the proposed API would support what
Mr. Hong is doing with his adjoints example in a less kludgy way.  And I
expect things to get worse, not better, in the short term as additional
kinds of memory like non-volatile RAM get introduced.)

--Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170317/e9a95141/attachment.html>


More information about the petsc-dev mailing list