[petsc-dev] Using multiple mallocs with PETSc

Mon Mar 13 01:35:09 CDT 2017

On Sat, Mar 11, 2017 at 12:36 PM, Jed Brown <jed at jedbrown.org> wrote:

> Barry Smith <bsmith at mcs.anl.gov> writes:
>
> >> I think it's accurate in the sense that the performance of real
> >> applications using a page migration system will be sufficiently close to
> >> the best manual page mapping strategy that nobody should bother with the
> >> manual system.
> >
> >    Will such a page migration system ever exist, is Intel working hard
> >    on it for KNL? What if no one provides such a page migration
> >    system? Should we just wait around until they do (which they won't)
> >    and do nothing else instead? Or will we have to do a half-assed
> >    hacky thing to work around the lack of the mythical decent page
> >    migration system?
>
> Libnuma has move_pages.  Prior to release, Intel refused to confirm that
> MCDRAM would be shown to the OS as a normal numa node, such that
> move_pages would work, and sometimes suggesting that it would not.  Some
> of the email history is me being incredulous this state before learning
> that the obvious implementation that I preferred was in fact what they
> did.
>
> Anyway, this means PETSc can track usage and call move_pages itself to
> migrate hot pages into MCDRAM.
>
> I don't know if Intel or Linux kernel people are going to tweak the
> existing automatic page migration to do this transparently, but we
> probably shouldn't hold our breath.
>

I am doubtful about how soon good automatic page migration approaches are
going to be implemented in the OS.  I note that, for the longest time
(though I have not investigated this recently), the Linux kernel would
often do a pretty bad job of choosing what memory to move to disk when
running codes with a working set size that required use of the swap space.
It generally would use some variation on a least-recently used (LRU)
eviction policy, which is good for some workloads, but actually the
opposite of what you want to do for a big scientific code that keeps doing
something like sweeping through a lattice.  The problem is, of course, that
the OS couldn't divine the details of what you were doing, so it would just
do LRU eviction, since that was reasonable for bunch of codes -- but it
could be very antagonistic to others.  The OS needed a mechanism to set a
reasonable replacement policy.  This didn't exist, so I wrote a middleware
library to deal with this when I was doing my dissertation research
(ancient history now, I guess).

>
> >> In cache mode, accessing infrequently-used memory (like TS trajectory)
> >> evicts memory that you will use again soon.
>

Yup.  Again, bad replacement policy (direct-mapped, in this case).  We need
a way for smart ones.  Hardware is not providing it; OSes may provide it
someday, but they don't now.

> >
> >    What if you could advise the malloc system that this chunk of
> >    memory should not be cached? Though this appears to be impossible
> >    by design?
>
> Malloc has nothing to do with cache, and I don't think the hardware has
> an interface that would allow the kernel to set policy at this
> granularity.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170312/4ab3e6b4/attachment.html>