[petsc-dev] PetscSection

Tue Sep 10 23:25:38 CDT 2013

On Sep 10, 2013, at 8:55 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> Matthew Knepley <knepley at gmail.com> writes:
>>> PetscSF cannot currently scatter individual bytes (4 bytes minimum), and
>>> even if it could, it's a horribly inefficient representation (4 or 8
>>> bytes of metadata for each byte of payload).  The quick fix at that
>>> moment was to send in units of size PetscInt (the struct was always
>>> going to be divisible by that size).

   Since when are PETSc people interested in quick fixes, we are only concerned with doing things right; quick fixes lead to bad code and bad ideas.

>>> 
>> 
>> The way I understand it, he never had a single MPI_BYTE, but always a bunch.
>> Shouldn't that packing handle that case?
> 
> In DMPlexDistributeData, your fieldSF is blown up so that every unit has
> its own metadata (rank, offset).  That's a terrible mechanism for moving
> structs with hundreds of bytes.
> 
> I would rather not make PetscSF deal with fully-heterogeneous data
> (where each node can have a different size) because it makes indexing a
> nightmare (you need a PetscSection or similar just to get in the door; I
> don't want PetscSF to depend on PetscSection).

   Having PetscSF depend on PetscSection would be insane (after all PetscSection has information about constrained variables and other silly nonsense) but having PetscSF depend on XX makes perfect sense.

A review:

(current situation; with unimportant stuff removed)

+  sf - star forest
.  nroots - number of root vertices on the current process (these are possible targets for other process to attach leaves)
.  nleaves - number of leaf vertices on the current process, each of these references a root on any process
.  ilocal - locations of leaves in leafdata buffers, pass NULL for contiguous storage
.  iremote - remote locations of root vertices for each leaf on the current process

PetscSFSetGraph(PetscSF sf,PetscInt nroots,PetscInt nleaves,PetscInt *ilocal,,PetscSFNode *iremote,)

typedef struct {
  PetscInt rank;                /* Rank of owner */  (why is this not PetscMPIInt?)
  PetscInt index;               /* Index of node on rank */
} PetscSFNode;

abstractly nleaves, ilocal  are the indices (into an array) you are copying too; iremote are the indices (into an array) you are copying from 

Now if PETSc had an abstract concept of indexing (funny no one ever put that in PETSc decades ago) it would look like

PetscSFSetGraph(PetscSF sf,PetscInt nroots,  toIndices  , fromIndices)

but wait; PETSc does have have an abstraction for indexing (into regular arrays) called IS, come to think of it, PETSc has another abstraction for indexing (into arrays with different sized items) called XX.  Actually thinking a tiny bit more one realizes that there is a third: PetscSFNode is a way of indexing (into regular arrays) on a bunch of different processes. We come up with a couple more related, but inconsistent syntaxes for indexing, we'll have code has good as Trilinos.

   So let's fix this up! One abstraction for indexing that handles both regular arrays, arrays with different size items and both kinds of arrays on a bunch of different processes; with simple enough syntax so it can be used whenever indexing is needed including passing to PetscSF! (We can ignore the need for MPI datatypes in communication for now). 

   Concrete examples to demonstrate this need not be so difficult. A completely general set of indices for this situation could be handled by 

typedef struct {
   PetscInt         nindices;
   PetscMPIInt   *ranks;
   PetscInt          *offsets;
   PetscInt          *sizes;
}  XXData;  Hardly more than PetscSFNode.  Or could be handled as an array of structs.

special cases would include: 
    all on one process so ranks is not needed
    all items the same size so sizes not needed
    contiquous memory (or strided) so offsets  not needed

If we decided not to go with an abstract object but instead just a concrete data structure that could handle all cases it could look something like

typedef struct {
   PetscInt         nindices;
   PetscMPIInt   *ranks;
   PetscInt          *offsets;
   PetscInt          *sizes;
   PetscInt          chunksize,strideoffset;
}  XXData;

For completeness I note with IS (for example in VecScatter creating with parallel vectors) we don't use rank,offset notation we use global offset == rstart[rank] + offset but that is something I'm sure we can deal with.

    QED

    Comments?