[petsc-dev] Soliciting suggestions for linear solver work under SciDAC 4 Institutes

Thu Jul 7 20:03:34 CDT 2016

> On Jul 7, 2016, at 7:05 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
> 
> 
> 
> On Thu, Jul 7, 2016 at 1:04 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jul 1, 2016 at 4:32 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    The DOE SciDAC institutes have supported PETSc linear solver research/code development for the past fifteen years.
> 
>     This email is to solicit ideas for linear solver research/code development work for the next round of SciDAC institutes (which will be a 4 year period) in PETSc. Please send me any ideas, no matter how crazy, on things you feel are missing, broken, or incomplete in PETSc with regard to linear solvers that we should propose to work on. In particular, issues coming from particular classes of applications would be good. Generic "multi physics" coupling types of things are too general (and old :-)) while  work for extreme large scale is also out since that is covered under another call (ECP). But particular types of optimizations etc for existing or new codes could be in, just not for the very large scale.
> 
>     Rough ideas and pointers to publications are all useful. There is an extremely short fuse so the sooner the better,
> 
> I think the suggestions so far are fine, however they all seem to start at the "how", whereas I would prefer we start at the "why". Maybe something like
> 
> 1) How do we run at bandwidth peak on new architectures like Cori or Aurora?

  Huh, there is a how here, not a why?
> 
> Patrick and Rich have good suggestions here. Karl and Rich showed some promising numbers for KNL at the PETSc meeting.
> 
> 
> Future systems from multiple vendors basically move from 2-tier memory hierarchy of shared LLC and DRAM to a 3-tier hierarchy of fast memory (e.g. HBM), regular memory (e.g. DRAM), and slow (likely nonvolatile) memory  on a node.  

  Jeff,

   Would Intel sell me a system that had essentially no regular memory DRAM (which is too slow anyway) and no slow memory (which is absurdly too slow)?  What cost savings would I get in $ and power usage compared to say what is going in the theta? 10% and 20%, 5% and 30%, 5% and 5 %? If it is a significant savings then get the cut down machine, if it is insignificant than realize the cost of not using it (the DRAM you paid so little for) is insignificant and not worth worrying about, just like cruise control when you don't use the highway. Actually I could use the DRAM to store the history needed for the adjoints; so maybe it is ok to keep, but surely not useful for data that is continuously involved in the computation.

   Barry

> Xeon Phi and some GPUs have caches, but it is unclear to me if it actually benefits software like PETSc to consider them.  Figuring out how to run PETSc effectively on KNL should be generally useful...
> 
> Jeff
> 
> -- 
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/