itaps-parallel Mesh migration stuff

Thu Feb 28 13:29:22 CST 2008

Hello, all.

Okay, those of you who have been thinking it can say "I could have told 
you so!"  The more I think about micro-migration, the more complicated 
I'm thinking it is, especially if one has to maintain ghosts.

Here's where I personally am right now on this. (And I've been thinking 
about this and scribbling examples a lot in the last day, so this is as 
new to Onkar as to everyone else.  So don't blame him for any of the 
incoherence in what follows. :-))

My basic premises are these:

1.  Communication about mesh entities in parallel should be handled by 
the parallel mesh interface, not by the application.  That is, the 
application shouldn't be talking between processes about mesh-related 
things without going through the interface.

2.  We want to avoid communication blocks as much as possible, and to 
facilitate communication latency hiding as much as possible.  This means 
I'm working in a paradigm where messages fly back and forth, and each 
process has to poll for newly-received messages from time to time.  This 
implies some mechanism for the implementation to inform the application 
when some request has been completed, or when some entity isn't 
available for local modification (i.e., locked) because communication is 
ongoing.

3.  All mesh entities on part bdrys will know the remote ID's (part + 
handle) of all other copies of that entity.

4.  All ghosted mesh entities will know the remote ID of their "master" 
copy (the one that is actually owned by some other part), and that 
"master" copy will know the remote ID's of all ghost copies.  I don't 
(at present, anyway) see any need for a ghost copy to distinguish 
between having a master copy that's on a part bdry versus in the 
interior of a part.

5.  Ghosting rules are known to the implementation, and can be described 
as first- or second-order adjacencies of entities on the part bdry.

6.  Currently I'm not trying to get my head around the issues of 
modifying entities on the part bdry in parallel on different parts.  One 
could argue that this should be illegal, because someone's modifying a 
non-owned entity.  But in the end it's a useful enough special case that 
we may need to support it anyway.

As nearly as I can tell currently, micro-migration can be done with 
three rounds of messages, plus some intervening changes in the mesh data 
for the various parts.

Round 1 (part O, for originating):  Request migration of an entity. 
This request may be aimed just at the entity ("I want to own that 
vertex/edge/face on the part bdry") or at the entity and its 
higher-order adjacencies ("I want to own that vertex and the star 
incident on it (so I can remove it)" or "I want to own that face and the 
region on its opposite side (so I can swap it)").  The originating part 
keeps track of how messages it's sent, and where, so that it can tally 
them as they come in.

Round 2 (part S, for sending):  The part that owns the entity or the 
part/parts that own the entity and its adjacencies send back information 
about the migrated entity/ies, entities that are now on the part bdry 
but weren't before, and entities that are now ghosts but weren't before. 
  Note that, in cases where multiple messages were sent, no modification 
of the mesh on part S can be done, because another part may refuse the 
migration; see below.

Round 3 (part O):  The part that originated the migration request, once 
it has replies from all parts it sent messages to, modifies its internal 
data and sends back an update to part(s) S, telling them which handles 
on part O correspond to which handles on part S.  Now part(s) S can 
update their own internal state, and everyone knows which local entities 
are connected to which remote entities (on which parts).

Of course, it's possible that a part may get two migration requests for 
the same entity.  At this point, it's crucial that the first 
migration-in-progress locks the entity, so that the second request can 
be denied (with an imaginative error code like MIGRATION_IN_PROGRESS). 
This is one of the points where locking is important to prevent various 
race conditions...

Long as this message currently is, it's still a little sparse on detail, 
compared with what's in my brain and notebook.  I'm going to go ahead 
and send it anyway, and work on a lovely sketch and a couple of 
examples, which I'll try to get sent out before the telecon for discussion.

Carl

-- 
------------------------------------------------------------------------
Dr. Carl Ollivier-Gooch, P.Eng.                   Voice: +1-604-822-1854
Associate Professor                                 Fax: +1-604-822-2403
Department of Mechanical Engineering             email: cfog at mech.ubc.ca
University of British Columbia              http://www.mech.ubc.ca/~cfog
Vancouver, BC  V6T 1Z4                  http://tetra.mech.ubc.ca/ANSLab/
------------------------------------------------------------------------