[petsc-dev] Soliciting suggestions for linear solver work under SciDAC 4 Institutes

Jeff Hammond jeff.science at gmail.com
Fri Jul 8 12:17:22 CDT 2016

On Fri, Jul 8, 2016 at 9:48 AM, Richard Mills <richardtmills at gmail.com>

> On Fri, Jul 8, 2016 at 9:40 AM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
>>> > 1) How do we run at bandwidth peak on new architectures like Cori or
>>> Aurora?
>>>   Huh, there is a how here, not a why?
>>> >
>>> > Patrick and Rich have good suggestions here. Karl and Rich showed some
>>> promising numbers for KNL at the PETSc meeting.
>>> >
>>> >
>>> > Future systems from multiple vendors basically move from 2-tier memory
>>> hierarchy of shared LLC and DRAM to a 3-tier hierarchy of fast memory (e.g.
>>> HBM), regular memory (e.g. DRAM), and slow (likely nonvolatile) memory  on
>>> a node.
>>>   Jeff,
>>>    Would Intel sell me a system that had essentially no regular memory
>>> DRAM (which is too slow anyway) and no slow memory (which is absurdly too
>>> slow)?  What cost savings would I get in $ and power usage compared to say
>>> what is going in the theta? 10% and 20%, 5% and 30%, 5% and 5 %? If it is a
>>> significant savings then get the cut down machine, if it is insignificant
>>> than realize the cost of not using it (the DRAM you paid so little for) is
>>> insignificant and not worth worrying about, just like cruise control when
>>> you don't use the highway. Actually I could use the DRAM to store the
>>> history needed for the adjoints; so maybe it is ok to keep, but surely not
>>> useful for data that is continuously involved in the computation.
>> *Disclaimer: All of the following data is pulled off of the Internet,
>> which in some cases is horribly unreliable.  My comments are strictly for
>> academic discussion and not meant to be authoritative or have any influence
>> on purchasing or design decisions.  Do not equate quoted TDP to measured
>> power during any workload, or assume that different measurements can be
>> compared directly.*
>> Your thinking is in line with
>> http://www.nextplatform.com/2015/08/03/future-systems-intel-ponders-breaking-up-the-cpu/.
>> ..
>> Intel sells KNL packages as parts (
>> http://ark.intel.com/products/family/92650/Intel-Xeon-Phi-Product-Family-x200#@Server)
>> that don't have any DRAM in them, just MCDRAM.  It's the decision of the
>> integrator what goes into the system, which of course is correlated to what
>> the intended customer wants.  While you might not need a node with DRAM,
>> many users do, and the systems that DOE buys are designed to meet the needs
>> of their broad user base.
>> I don't know if KNL is bootable without no DRAM at all - this is likely
>> more to do with what motherboard, BIOS, etc. expect than the processor
>> package itself.  However, the KNL alltoall mode addresses the case where
>> DRAM channels are underpopulated (with fully populated channels, one should
>> use quadrant, hemisphere, SNC-2 or SNC-4), so if DRAM is necessary, you
>> should be able to boot it with only one channel populated.  Of course, if
>> you do this, you'll get 1/6 of the DDR4 bandwidth.
> Just FYI: I have run on KNL systems with no DRAM, only MCDRAM.  This was
> on an internal lab machine and not a commercially available system, but I
> see no reason why one couldn't buy systems this way.
It puts quite a bit of pressure on the system software footprint if one
does not have DDR4.  Blue Gene CNK had a very small memory footprint, but
commodity Linux is certainly much larger.  The memory footprint of MPI at
scale depends on the fabric HW/SW.  It's probably cheaper to buy one 32 GB
stick per node than pay for someone to write the system software that gives
you at least 15.5 GB of usable MCDRAM.


Jeff Hammond
jeff.science at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20160708/2a09aee0/attachment.html>

More information about the petsc-dev mailing list