[petsc-dev] Using multiple mallocs with PETSc

Richard Mills richardtmills at gmail.com
Thu Mar 9 20:02:00 CST 2017


On Thu, Mar 9, 2017 at 5:38 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Mar 9, 2017, at 7:18 PM, Richard Mills <richardtmills at gmail.com>
> wrote:
> >
> > Hi Barry,
> >
> > I like the sound of this, but I think we'd need to be careful about not
> messing up data alignment if we do this.  If we want a malloc that is going
> to let us put the start of an array on, say, a 64 byte alignment boundary,
> then we need to not mess that up by putting this integer value there.
>
>    As I said the extra space is 64 bit. Now if you want 128 bit alignment
> we could put a 128 bit.
>

KNL wants 64 *byte* alignment, not 64 bit.


>
> > We could pad with an extra 64 bytes internally, though that may be
> getting too wasteful.  I don't know how to get a malloc that gives us a
> starting address that is 64 bits *before* an alignment boundary (so that
> the memory the user sees from the malloc call indeed starts at the
> boundary), but maybe that's doable...
>
>    What alignment boundaries are useful for Intel processes? 64 yup, 128,
> 256, 512 ? Does higher values provide better performance for SIMD etc?
>

KNL has 64 byte cache lines, hence the preference for 64 byte alignment.
This isn't as big an issue as it was on the previous generation "Knights
Corner" (KNC) Xeon Phi, which really suffered from alignment issues and
required two instructions for an unaligned load.  On KNL only one
instruction is needed.  You are using the cache better, of course, if you
can honor those cache line boundaries, though.


>
> >
> > If the goal is to simply deal with allocations to high bandwidth memory
> on KNL, the memkind-provided free() will do the right thing with
> allocations in DRAM or MCDRAM.
>
>    Hmm, Hong, how come we don't use this? I didn't realize it worked this
> way. This would shut Jed up immediately.
>
>    Sadly, I fear the answer is we don't use memkind because it sucks :-)
> Calm down Jeff, I didn't insult your mother.
>

You just need to use the slightly more complicated memkind_malloc (or
memkind_posix_memalign) instead of hbw_malloc().  memkind_free() does ask
for type of memory, but according to the man page:

"In cases where the kind is unknown in the context of the call to
memkind_free() 0 can be given as the kind specified to memkind_free() but
this will require a look up that can be bypassed by specifying a non-zero
value."

This seems like a non-issue -- how often is a PETSc code going to be doing
performance critical free() calls?

--Richard


>
>
>   Barry
>
> > But, as you say, there are issues in other cases, like with
> -malloc_debug.
> >
> > --Richard
> >
> > On Thu, Mar 9, 2017 at 4:19 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   Using different mallocs for different objects/arrays in PETSc is very
> iffy because each free() has to match the malloc used for that memory. This
> is even true with just -malloc_debug in that certain initialization
> functions in PETSc need to use the raw malloc() because we cannot be sure
> if the (*PetscTrMalloc)() has been set yet and the raw free() that comes at
> PetscFinalize() time needs to be matched with it.
> >
> >   Why not have PetscMalloc() ALWAYS allocate an extra 64 bit space at
> the beginning and put in an integer indicating the malloc family that has
> been used to get the space. PetscFree() would use this integer to determine
> the correct free() to use. A mechanism to register new malloc families
> could be easily done, for example
> >
> >     PetscMallocRegister(malloc,realloc,free,&basicmalloc);
> >     PetscMallocRegister(PetscMallocDebug,PetscReallocDebug,
> PetscFreeDebug,&debugmalloc);
> >     PetscMallocRegister(PetscMallocHBW,PetscReallocHBW,PetscFreeHBW,&
> hbwmalloc);
> >
> >     To change the malloc used you would do
> PetscMallocPush(debugmalloc);  PetscMalloc(....); .... PetscMallocPop();
> Note that you can register additional malloc families at any time (it
> doesn't have to be as soon as the program starts up).
> >
> >    What is wrong with the model and why shouldn't we use it?
> >
> >   Barry
> >
> > Notes:
> >
> > It is easy to implement, so that is not a reason.
> >
> > The extra memory usage is trivial.
> >
> > The mapping from integer to malloc() or free() would be a bounds check
> and then accessing the function pointer from a little array so pretty cheap.
> >
> > if certain mallocs are missing (like PetscMallocHBW) the hbwmalloc
> variable could be set to the basicmalloc value (or some other) so one would
> not need to ifdef if if () code deciding which malloc to use in many places.
> >
> > It seems so simple something must be fundamentally flawed with it. Even
> with just PetscTrMallocDefault() and PetscMallocAlign() I feel like
> implementing it.
> >
> >
> >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170309/3e61cef3/attachment.html>


More information about the petsc-dev mailing list