itaps-parallel Notes from today's meeting

Tue Apr 13 11:06:36 CDT 2010

I have a more radical proposal for addressing the issues from the last
iMeshP telecon.  Re-consider the idea of an "active" partition.  That is,
there is one partition that corresponds to how elements are currently
distributed.  The application can define other partitions with the intent of
either transitioning to that partition as the active one directly or writing
the mesh to disk with that defined as the partition that will be active when
the mesh is read back in.  Let's take this idea one step further and define
the active partition to always have one part per process*.  If there were
more parts than processes in the input data, then there may exist an
inactive partition corresponding to what was defined in the input.

As far as I could follow the discussion, there were two proposed mechanisms
for re-partitioning the mesh in parallel.  The first was to move entities
between parts in the active partition and the second was to define a new
partitioning completely before transitioning to that.  I mentioned the
latter above.  As for the former, if there were fewer parts than processes
when the mesh is loaded, then some processes would end up with empty Parts
in the active partition.  Existing functions could be used to move entities
between Parts.

So far, we've assumed one iMesh instance per process.  Mark B. would like
one iMesh instance per Part.  Having one active Part per process results in
both of those being true for the active partition.

As for the meaning of iMeshP_destroyPart: the 'active' partition is
implicitly defined by the number of processes.  It would be an error to ask
to create or destroy a part in the active partition because that would imply
adding or removing processes, which is beyond the scope of iMeshP.  For
modification of in-active partitions, deleting the part doesn't imply
anything about the ownership of the entity so it shouldn't be a problem to
delete non-empty parts.

But beyond all of the above, the biggest advantage of this change is the
simplification of the data model for simple use cases.  I worry that
allowing for multiple parts per process and such other complexities that are
unnecessary for simple use cases (e.g. a simple heat-transfer simulation)
might impede adoption of the API.  It is important that the API support
re-partitioning and other advanced use cases, but it should also be simple
and easy to understand and use for the many simpler use cases.  And having
multiple iMesh instances per process is definitely moving in the wrong
direction on the simplicity scale.

It is perhaps a little late in the game for API changes, but this change
would facility some simplification of the API also.  Consider some existing
code for which someone wants to implement the iMeshP API so as to integrate
some other simulation code.  Our current API seems overly complicated to
implement for someone who as no interest in parallel repartitioning.  We
could divide the current API into two separate subsets: one for
communicating data and one for querying and modifying inactive partitions.
All functions in the former could work with rank rather than a part ID or
part handle.  The only overlap between the two sub-APIs would be the
function to transition to a different active partition.  Functions for
adding and removing entities from Parts could go with the partition
modification subset.

For redistribution within the active partition, a simple function to change
ownership would be sufficient and need not even do any communication.
Presumably any such function would need to be called on at least the current
and new owning processes (and presumably also any processes which have a
ghost of the entity.) If the function accepts an entity handle then the
entity must already be ghosted on the new owning process.  The existing
ghosting code can be used to communicate entities and the change of
ownership can then be done with no communication at all.

- jason

* Perhaps 'process' isn't the correct granularity for the future really big
machines that Mark S. mentioned.  I use that term here on the assumption
that there will be a single multi-threaded process on a multi-core node in
that case.  If that assumption is incorrect, substitute 'node' or whatever
is appropriate in place of 'process'.