[petsc-dev] Using multiple mallocs with PETSc

Fri Mar 10 00:04:30 CST 2017

On Thu, Mar 9, 2017 at 9:05 PM, Jeff Hammond <jeff.science at gmail.com> wrote:

>
>
> On Thu, Mar 9, 2017 at 8:08 PM, Richard Mills <richardtmills at gmail.com>
> wrote:
>
>> On Thu, Mar 9, 2017 at 7:45 PM, Jeff Hammond <jeff.science at gmail.com>
>> wrote:
>>
>>>
>>>> I started to play with memkind last summer. At that time, there were
>>>> plenty of sayings online like this:
>>>> "the *hbwmalloc* interface is stable but *memkind* interface is only
>>>> partially stable."
>>>>
>>>>
>>> If you want the most stable interface, just use libnuma.  It took me
>>> less than a day to reimplement hbwmalloc.h on top of libnuma and dlmalloc (
>>> https://github.com/jeffhammond/myhbwmalloc).  Note that myhbwmalloc was
>>> an education exercise, not software that I actually think anyone should
>>> use.  It is intentionally brittle (fast or fail - nothing in between).
>>>
>>> One consequence of using libnuma to manage MCDRAM is that one can call
>>> numa_move_pages, which Jed has asserted is the single most important
>>> function call in the history of memory management ;-)
>>>
>>
>> I think you can also move pages allocated by memkind around by calling
>> numa_move_pages, actually, but this breaks the heap partitioning that
>> memkind does.
>>
>> I actually question whether we even need a heap manager for things like
>> big arrays inside of Vec objects.  It should be fine to just call mmap()
>> directly for those.  These will tend to be big things that don't get
>> allocated/deallocated too frequently, so it probably won't matter that an
>> expensive system call is required.
>>
>>
> I think this is a terrible idea.  What happens when a user runs a tiny
> debug job that takes 1000x longer than it should because every object
> ctor/dtor requires a system call?
>

I'm not necessarily saying it's a great idea, either.  But if we are
seriously wanting to do things like migrating our own pages around
(expensive system call required), then doing our own mmap() and mbind()
calls may make sense.  You'd only want to do this for large and long-lived
objects (like a Jacobian matrix) that you expect may need to reside in the
high-bandwidth memory.  I did a lot of low-level systems stuff for my
dissertation, working on out-of-core calculations, and I wrote a middleware
library that handled memory placement and did all of the memory allocation
directly with mmap() calls.  For long-lived linear algebra objects, the
overhead for that was negligible.

Personally, I think that memkind is something we should support it in
PETSc.  That doesn't mean it's the ideal solution, but it has utility now.
We can certainly support something else when something better arrives.

--Richard

>
> Jeff
>
>
>> --Richard
>>
>>
>>> Jeff
>>>
>>>
>>>> Perhaps I should try memkind calls since they may become much better.
>>>>
>>>> Hong (Mr.)
>>>>
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>> http://jeffhammond.github.io/
>>>
>>
>>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170309/3c6eee5d/attachment.html>