[petsc-dev] Soliciting suggestions for linear solver work under SciDAC 4 Institutes

Fri Jul 8 21:14:30 CDT 2016

On Friday, July 8, 2016, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Jul 8, 2016, at 12:17 PM, Jeff Hammond <jeff.science at gmail.com
> <javascript:;>> wrote:
> >
> >
> >
> > On Fri, Jul 8, 2016 at 9:48 AM, Richard Mills <richardtmills at gmail.com
> <javascript:;>> wrote:
> >
> >
> > On Fri, Jul 8, 2016 at 9:40 AM, Jeff Hammond <jeff.science at gmail.com
> <javascript:;>> wrote:
> >
> > > 1) How do we run at bandwidth peak on new architectures like Cori or
> Aurora?
> >
> >   Huh, there is a how here, not a why?
> > >
> > > Patrick and Rich have good suggestions here. Karl and Rich showed some
> promising numbers for KNL at the PETSc meeting.
> > >
> > >
> > > Future systems from multiple vendors basically move from 2-tier memory
> hierarchy of shared LLC and DRAM to a 3-tier hierarchy of fast memory (e.g.
> HBM), regular memory (e.g. DRAM), and slow (likely nonvolatile) memory  on
> a node.
> >
> >   Jeff,
> >
> >    Would Intel sell me a system that had essentially no regular memory
> DRAM (which is too slow anyway) and no slow memory (which is absurdly too
> slow)?  What cost savings would I get in $ and power usage compared to say
> what is going in the theta? 10% and 20%, 5% and 30%, 5% and 5 %? If it is a
> significant savings then get the cut down machine, if it is insignificant
> than realize the cost of not using it (the DRAM you paid so little for) is
> insignificant and not worth worrying about, just like cruise control when
> you don't use the highway. Actually I could use the DRAM to store the
> history needed for the adjoints; so maybe it is ok to keep, but surely not
> useful for data that is continuously involved in the computation.
> >
> > Disclaimer: All of the following data is pulled off of the Internet,
> which in some cases is horribly unreliable.  My comments are strictly for
> academic discussion and not meant to be authoritative or have any influence
> on purchasing or design decisions.  Do not equate quoted TDP to measured
> power during any workload, or assume that different measurements can be
> compared directly.
> >
> > Your thinking is in line with
> http://www.nextplatform.com/2015/08/03/future-systems-intel-ponders-breaking-up-the-cpu/.
> ..
> >
> > Intel sells KNL packages as parts (
> http://ark.intel.com/products/family/92650/Intel-Xeon-Phi-Product-Family-x200#@Server)
> that don't have any DRAM in them, just MCDRAM.  It's the decision of the
> integrator what goes into the system, which of course is correlated to what
> the intended customer wants.  While you might not need a node with DRAM,
> many users do, and the systems that DOE buys are designed to meet the needs
> of their broad user base.
> >
> > I don't know if KNL is bootable without no DRAM at all - this is likely
> more to do with what motherboard, BIOS, etc. expect than the processor
> package itself.  However, the KNL alltoall mode addresses the case where
> DRAM channels are underpopulated (with fully populated channels, one should
> use quadrant, hemisphere, SNC-2 or SNC-4), so if DRAM is necessary, you
> should be able to boot it with only one channel populated.  Of course, if
> you do this, you'll get 1/6 of the DDR4 bandwidth.
> >
> > Just FYI: I have run on KNL systems with no DRAM, only MCDRAM.  This was
> on an internal lab machine and not a commercially available system, but I
> see no reason why one couldn't buy systems this way.
> >
> >
> > It puts quite a bit of pressure on the system software footprint if one
> does not have DDR4.  Blue Gene CNK had a very small memory footprint, but
> commodity Linux is certainly much larger.  The memory footprint of MPI at
> scale depends on the fabric HW/SW.  It's probably cheaper to buy one 32 GB
> stick per node than pay for someone to write the system software that gives
> you at least 15.5 GB of usable MCDRAM.
> >
>    Sure.
>
>     But this doesn't help with the question of whether the DRAM uses
> significant power. Say it does use significant power then it might make
> sense to be able to turn off most of the DRAM via software when running
> PETSc programs. If it doesn't use significant power then it doesn't matter
> if we don't use it.
>
>
Like I said, it's approximately 0.37 W/GB and in the neighborhood of 15-20%
of total system power in supercomputers.

On the other hand, CPU is approx 50-60% of system power and, unlike DRAM,
amenable to a range of power optimizations, at least in theory. Whether
operators give users C- and P-state control is a policy question.

Jeff

>   Barry
>
> > Jeff
> >
> > --
> > Jeff Hammond
> > jeff.science at gmail.com <javascript:;>
> > http://jeffhammond.github.io/
>
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20160708/6f896b6a/attachment.html>