[petsc-dev] data locality issues in PETSc

Fri Jun 24 11:30:04 CDT 2011

  Moving this discussion to petsc-dev since it is relevant for much more than this one preconditioner.

  In PETSc the Mat object is responsible for the layout of the matrix (graph) data and the Vec object is responsible for the layout of the vector data. This will be true within a compute node as it has always been outside of the compute nodes. If one uses a DM to create the Mat and Vec objects then the DM may decide the data layout (just as it currently does between processes) within the node for NUMA reasons (i.e. that touch all business).

  I think it is ok to assume that the input (on process part of) Mat's and Vec's have good data layout across the the node memory (as well as across the nodes as you assumed n years ok), if that is not the case then we will need to have generic tools to repartition them anyways that would be used in a preprocessing step before applying the KSP before the GAMG ever sees the data. 

  I don't think you should get too hung up on the multiple thread business in your first cut, just do something and it will evolve as we figure out what we are doing.

   Barry

On Jun 24, 2011, at 11:17 AM, Mark F. Adams wrote:

> Folks, I'm looking at designing this GAMG code and I'm running into issues that are pretty general in PETSc so I want to bring them up.
> 
> The setup for these GAMG methods is simplified if subdomains are small.  So I'm thinking that just writing a flat program per MPI processes will not be good long term.  These issues are very general and they touch on the parallel data model of PETSc so I don't want to go off and just do something w/o discussing this.
> 
> Generally as we move to multi/many core nodes, a mesh that is well partitioned onto nodes in not good enough.  Data locality in the node needs to be addressed and it probably needs to be addressed explicitly.  The GAMG algorithm requires finding fine grid points in the coarse grid mesh (and if the coarse grid does not cover the fine grid then something has to be done like find the nearest coarse grid triangle/tet/element).  Doing this well requires a lot of unstructured mesh work and data structures, and it can get goopy.  I'd like to simplify this, at least for an initial implementation, but I want to build in a good data model for say machines in 5-10 years.  THe flat MPI model that I use in Prometheus is probably not a good model.  
> 
> Let me throw out a straw man to at least get the discussion going.  My first thought for the GAMG setup is use METIS to partition the local vertices into domains of some approximate size (say 100 or 10^D) and have an array or arrays of coordinates, for instance, on each node.  Put as much of the GAMG setup into an outer loop over the subdomains on a node.  THis outer loop could be multi-threaded eventually.  This basically lets GAMG get away with N^2 like algorithms because N is an internal parameter (eg, 100), and it addresses general data locality issues.  Anyway, this is just a straw man, I have not thought this through completely.
> 
> An exercise that might help to ground this discussion is to think about a case study of a "perfect" code in 5-10 years.  GAMG has promise, I think, for convective/hyperbolic problems where the geometric coarse functions can be a big win.  So perhaps we could think about a PFLOWTRAN problem on a domain that would be suitable for GAMG.  Or an ice sheet problem.  THese data locality issues should be addressed at the application level if feasible, but I don't know what a reasonable model would be.  For instance, my model in Prometheus is that the user provides me with a well partitioned fine grid matrix.  I do not repartition the fine grid, but I do not make any assumptions about data locality within an MPI process.  We may want to demand more out of users if they expect to go extreme-scale.  Anyway just some thoughts,
> 
> Mark
> 
> 
>