[petsc-dev] Using multiple mallocs with PETSc

Jed Brown jed at jedbrown.org
Tue Mar 14 18:59:45 CDT 2017


Richard Mills <richardtmills at gmail.com> writes:

> On Tue, Mar 14, 2017 at 2:18 PM, Jed Brown <jed at jedbrown.org> wrote:
>
>> Richard Mills <richardtmills at gmail.com> writes:
>>
>> > On Tue, Mar 14, 2017 at 1:23 PM, Jed Brown <jed at jedbrown.org> wrote:
>> >
>> >> Barry Smith <bsmith at mcs.anl.gov> writes:
>> >>
>> >> >> On Mar 13, 2017, at 1:27 PM, Jed Brown <jed at jedbrown.org> wrote:
>> >> >>
>> >> >> Satish Balay <balay at mcs.anl.gov> writes:
>> >> >>> stash the metadata for each allocation (and pointers for
>> corresponding
>> >> >>> free) in a hash table for all mallocs that we need to track? [this
>> >> >>> avoids the wasted 'space' in each alloc.]
>> >> >>
>> >> >> Sure, but this is just duplicating an implementation of malloc.
>> >> >
>> >> >    No it isn't. It is a very thin wrapper around multiple current
>> >> mallocs.
>> >>
>> >> Meh, the proposal has more storage overhead than malloc().
>> >>
>> >
>> > I was bored or something, so I actually looked into how people who want
>> to
>> > track all the allocations inside a special malloc() do so, and it seems
>> > that plenty of people use a red-black tree for this (balanced binary
>> tree,
>> > O(log(n) for search, insert/delete, and tree rearrangement) rather than a
>> > hash table.  This is getting pretty far down in the weeds... but this
>> would
>> > have less storage overhead than a hash table.  Just FYI. =)
>>
>> Tcmalloc has an overhead of 1% for common usage patterns when allocating
>> 8-byte objects.  A tree is much higher overhead.
>>
>
> But I'm not talking about doing the same thing as a malloc implementation:
> we aren't trying to do things like keep track of the "free" list, just what
> free() to use for a given allocation.  

Yes, but my point is that the much simpler thing you're doing is
actually much higher overhead than a full malloc implementation.  It's
as though you're running an expensive job on a supercomputer and I tell
you my phone does the same thing in real time and you reject my
criticism to say that my phone is actually doing something more
difficult.  ;-)

> And to keep this overhead down, we might consider having something
> like a normal PetscMalloc() and a PetscMallocNumeric() that is just
> used to allocate arrays for things like vectors and matrices -- the
> things that we think might be bandwidth critical and may need to
> support different allocators or be considered for migration between
> memory types.  There aren't going to be tons of objects you'd
> allocated with PetscMallocNumeric, so the storing all these addresses
> and a corresponding free() should have very little overhead.

Yes, my objection is specifically with regard to incurring this overhead
for small allocations.  I care about overhead because of small objects,
such as might appear in a linked list or tree.  For large objects, you
could just as well skip malloc and call mmap() directly -- that's what
malloc() is doing.  For intermediate sizes (multiple pages, but less
than MMAP_THRESHOLD) in the presence of threads, one could argue that
calling mmap() directly is preferred because malloc() gives you memory
that is already faulted and thus first-touch won't do the right thing.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170314/8da6b1fe/attachment.sig>


More information about the petsc-dev mailing list