<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sat, Mar 11, 2017 at 12:36 PM, Jed Brown <span dir="ltr"><<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> writes:<br>
<br>
>> I think it's accurate in the sense that the performance of real<br>
>> applications using a page migration system will be sufficiently close to<br>
>> the best manual page mapping strategy that nobody should bother with the<br>
>> manual system.<br>
><br>
> Will such a page migration system ever exist, is Intel working hard<br>
> on it for KNL? What if no one provides such a page migration<br>
> system? Should we just wait around until they do (which they won't)<br>
> and do nothing else instead? Or will we have to do a half-assed<br>
> hacky thing to work around the lack of the mythical decent page<br>
> migration system?<br>
<br>
</span>Libnuma has move_pages. Prior to release, Intel refused to confirm that<br>
MCDRAM would be shown to the OS as a normal numa node, such that<br>
move_pages would work, and sometimes suggesting that it would not. Some<br>
of the email history is me being incredulous this state before learning<br>
that the obvious implementation that I preferred was in fact what they<br>
did.<br>
<br>
Anyway, this means PETSc can track usage and call move_pages itself to<br>
migrate hot pages into MCDRAM.<br><br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
I don't know if Intel or Linux kernel people are going to tweak the<br>
existing automatic page migration to do this transparently, but we<br>
probably shouldn't hold our breath.<br></blockquote><div> </div><div><div>I am doubtful about how soon good automatic page migration
approaches are going to be implemented in the OS. I note that, for the
longest time (though I have not investigated this recently), the Linux
kernel would often do a pretty bad job of choosing what memory to move
to disk when running codes with a working set size that required use of
the swap space. It generally would use some variation on a
least-recently used (LRU) eviction policy, which is good for some
workloads, but actually the opposite of what you want to do for a big
scientific code that keeps doing something like sweeping through a
lattice. The problem is, of course, that the OS couldn't divine the
details of what you were doing, so it would just do LRU eviction, since
that was reasonable for bunch of codes -- but it could be very
antagonistic to others. The OS needed a mechanism to set a reasonable
replacement policy. This didn't exist, so I wrote a middleware library
to deal with this when I was doing my dissertation research (ancient
history now, I guess).<br> <br></div>
<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="gmail-"><br>
>> In cache mode, accessing infrequently-used memory (like TS trajectory)<br>
>> evicts memory that you will use again soon.<br></span></blockquote><div><br></div><div>Yup. Again, bad replacement policy (direct-mapped, in this case). We need a way for smart ones. Hardware is not providing it; OSes may provide it someday, but they don't now.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">
><br>
> What if you could advise the malloc system that this chunk of<br>
> memory should not be cached? Though this appears to be impossible<br>
> by design?<br>
<br>
</span>Malloc has nothing to do with cache, and I don't think the hardware has<br>
an interface that would allow the kernel to set policy at this<br>
granularity.<br>
</blockquote></div><br></div></div>