[petsc-dev] Soliciting suggestions for linear solver work under SciDAC 4 Institutes

Richard Mills richardtmills at gmail.com
Fri Jul 8 11:48:21 CDT 2016

On Fri, Jul 8, 2016 at 9:40 AM, Jeff Hammond <jeff.science at gmail.com> wrote:

>> > 1) How do we run at bandwidth peak on new architectures like Cori or
>> Aurora?
>>   Huh, there is a how here, not a why?
>> >
>> > Patrick and Rich have good suggestions here. Karl and Rich showed some
>> promising numbers for KNL at the PETSc meeting.
>> >
>> >
>> > Future systems from multiple vendors basically move from 2-tier memory
>> hierarchy of shared LLC and DRAM to a 3-tier hierarchy of fast memory (e.g.
>> HBM), regular memory (e.g. DRAM), and slow (likely nonvolatile) memory  on
>> a node.
>>   Jeff,
>>    Would Intel sell me a system that had essentially no regular memory
>> DRAM (which is too slow anyway) and no slow memory (which is absurdly too
>> slow)?  What cost savings would I get in $ and power usage compared to say
>> what is going in the theta? 10% and 20%, 5% and 30%, 5% and 5 %? If it is a
>> significant savings then get the cut down machine, if it is insignificant
>> than realize the cost of not using it (the DRAM you paid so little for) is
>> insignificant and not worth worrying about, just like cruise control when
>> you don't use the highway. Actually I could use the DRAM to store the
>> history needed for the adjoints; so maybe it is ok to keep, but surely not
>> useful for data that is continuously involved in the computation.
> *Disclaimer: All of the following data is pulled off of the Internet,
> which in some cases is horribly unreliable.  My comments are strictly for
> academic discussion and not meant to be authoritative or have any influence
> on purchasing or design decisions.  Do not equate quoted TDP to measured
> power during any workload, or assume that different measurements can be
> compared directly.*
> Your thinking is in line with
> http://www.nextplatform.com/2015/08/03/future-systems-intel-ponders-breaking-up-the-cpu/.
> ..
> Intel sells KNL packages as parts (
> http://ark.intel.com/products/family/92650/Intel-Xeon-Phi-Product-Family-x200#@Server)
> that don't have any DRAM in them, just MCDRAM.  It's the decision of the
> integrator what goes into the system, which of course is correlated to what
> the intended customer wants.  While you might not need a node with DRAM,
> many users do, and the systems that DOE buys are designed to meet the needs
> of their broad user base.
> I don't know if KNL is bootable without no DRAM at all - this is likely
> more to do with what motherboard, BIOS, etc. expect than the processor
> package itself.  However, the KNL alltoall mode addresses the case where
> DRAM channels are underpopulated (with fully populated channels, one should
> use quadrant, hemisphere, SNC-2 or SNC-4), so if DRAM is necessary, you
> should be able to boot it with only one channel populated.  Of course, if
> you do this, you'll get 1/6 of the DDR4 bandwidth.

Just FYI: I have run on KNL systems with no DRAM, only MCDRAM.  This was on
an internal lab machine and not a commercially available system, but I see
no reason why one couldn't buy systems this way.


> As to the question of DRAM power, there is a lot of detailed information
> available (e.g.
> https://www.micron.com/~/media/documents/products/power-calculator/ddr4_power_calc.xlsm,
> https://www.micron.com/~/media/Documents/Products/Technical%20Note/DRAM/TN4603.pdf,
> https://lenovopress.com/lp0083.pdf) but since I am lazy, I'll use the
> numbers reported on
> http://www.tomshardware.com/reviews/intel-core-i7-5960x-haswell-e-cpu,3918-13.html
> for client memory (i.e. not server memory, hence probably not providing
> ECC, but ECC doesn't change power consumption much), which works out to
> 0.37 W/GB for DDR4-2133, hence 71 W for 192 GB [
> http://www.nextplatform.com/2015/11/30/inside-future-knights-landing-xeon-phi-systems/].
> That 71W is ~1/3 of the processor package power (215W).  The network
> adapter draws some power, and the cables and switches (especially optics)
> are a nontrivial power draw.  So DRAM is at most 25% of the node power, and
> perhaps ~17% of system power based upon what I can derive from Shaheen II.
> Shaheen II Cray XC40
> 1.96 MW = 6174 * (2 sockets * 135 W/socket + 128 GB * 0.37 W/GB)
> 2.83 MW total
> = 69% from CPU+DRAM
> Again, *these are not the exact numbers* but what I can derive from
> https://www.top500.org/system/178515,
> https://www.hpc.kaust.edu.sa/content/shaheen-ii and
> http://ark.intel.com/products/81060/Intel-Xeon-Processor-E5-2698-v3-40M-Cache-2_30-GHz
> .
> Back to the higher level analysis, what is unfortunate about DRAM is that
> it needs power to hold data even if the data isn't used, because it is not
> persistent.  I don't know how well it powers down when the physical memory
> isn't mapped but it seems that power is not gated today [
> http://digitalpiglet.org/research/sion2014socc.pdf].  The advantage of
> nonvolatile memory is that it doesn't require power when not being
> accessed, whether or not the data is preserved.
> I suspect that nonvolatile memory (NVM) is the right place to put your
> adjoint matrices, provided the NVM bandwidth is sufficient.
> *Disclaimer: All of these are academic comments.  Do not use them to try
> to influence others or make any decisions.  Do your own research and be
> skeptical of everything I derived from the Internet.*
> Jeff
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20160708/2429a715/attachment.html>

More information about the petsc-dev mailing list