[petsc-dev] Adding support memkind allocators in PETSc

Wed Apr 29 00:47:54 CDT 2015

Richard Mills <rtm at utk.edu> writes:

>> Really?  That's what I'm asking for.
>
> Yes, I am ~ 99% sure that this is the case, but I will double-check to make
> sure.

Thanks.

>> For small allocations, it doesn't matter where the memory is located
>> because it's either in cache or it's not.  From what I hear, KNL's
>> MCDRAM won't improve latency, so all such allocations may as well go in
>> DRAM anyway.  So all I care about are substantial allocations, like
>> matrix and vector data.  It's not expensive to allocate those the align
>> with page boundaries (provided they are big enough; coarse grids don't
>> matter).
>
> Yes, MCDRAM won't help with latency, only bandwidth, so for small
> allocations it won't matter.  Following reasoning like what you have above,
> a colleague on my team recently developed an "AutoHBW" tool for users who
> don't want to modify their code at all.  A user can specify a size
> threshold above which allocations should come from MCDRAM, and then the
> tool interposes on the malloc() (or other allocator) calls to put the small
> stuff in DRAM and the big stuff in MCDRAM.

What's the point?  If you can fit all the "large" allocations in MCDRAM,
can't you just fit everything in MCDRAM?  Is that so bad?

> I think many users are going to want more control than what something like
> AutoHBW provides, but, as you say, a lot of the time one will only care
> about the the substantial allocations for things like matrices and vectors,
> and these also tend to be long lived--plenty of codes will do something
> like allocate a matrix for Jacobians once and keep it around for the
> lifetime of the run.  Maybe we should consider not using a heap manager for
> these allocations, then.  For allocations above some specified threshold,
> perhaps we (PETSc) should simply do the appropriate mmap() and mbind()
> calls to allocate the pages we need in the desired type of memory, and then
> we could use things like use move_pages() if/when appropriate (yes, I know
> we don't yet have a good way to make such decisions).  This would mean
> PETSc getting more into the lower level details of memory management, but
> maybe this is appropriate (an unavoidable) as more kinds of
> user-addressable memory get introduced.  I think is actually less horrible
> than it sounds, because, really, we would just want to do this for the
> largest allocations.  (And this is somewhat analogous to how many malloc()
> implementations work, anyway: Use sbrk() for the small stuff, and mmap()
> for the big stuff.)

I say just use malloc (or posix_memalign) for everything.  PETSc can't
do a better job of the fancy stuff and these normal functions are
perfectly sufficient.

>> That is a regression relative to move_pages.  Just make move_pages work.
>> That's the granularity I've been asking for all along.
>
> Cannot practically be done using a heap manager system like memkind.  But
> we can do this if we do our own mmap() calls, as discussed above.

In practice, we would still use malloc(), but set mallopt
M_MMAP_THRESHOLD if needed and call move_pages.  The reality is that
with 4 KiB pages, it doesn't even matter if your "large" allocation is
not page aligned.  The first and last page don't matter--they're small
enough to be inexpensive to re-fetch from DRAM and don't use up that
much extra space if you map them into MCDRAM.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20150428/d0620932/attachment.sig>