itaps-parallel Mesh migration stuff
Carl Ollivier-Gooch
cfog at mech.ubc.ca
Thu Feb 28 13:29:22 CST 2008
Hello, all.
Okay, those of you who have been thinking it can say "I could have told
you so!" The more I think about micro-migration, the more complicated
I'm thinking it is, especially if one has to maintain ghosts.
Here's where I personally am right now on this. (And I've been thinking
about this and scribbling examples a lot in the last day, so this is as
new to Onkar as to everyone else. So don't blame him for any of the
incoherence in what follows. :-))
My basic premises are these:
1. Communication about mesh entities in parallel should be handled by
the parallel mesh interface, not by the application. That is, the
application shouldn't be talking between processes about mesh-related
things without going through the interface.
2. We want to avoid communication blocks as much as possible, and to
facilitate communication latency hiding as much as possible. This means
I'm working in a paradigm where messages fly back and forth, and each
process has to poll for newly-received messages from time to time. This
implies some mechanism for the implementation to inform the application
when some request has been completed, or when some entity isn't
available for local modification (i.e., locked) because communication is
ongoing.
3. All mesh entities on part bdrys will know the remote ID's (part +
handle) of all other copies of that entity.
4. All ghosted mesh entities will know the remote ID of their "master"
copy (the one that is actually owned by some other part), and that
"master" copy will know the remote ID's of all ghost copies. I don't
(at present, anyway) see any need for a ghost copy to distinguish
between having a master copy that's on a part bdry versus in the
interior of a part.
5. Ghosting rules are known to the implementation, and can be described
as first- or second-order adjacencies of entities on the part bdry.
6. Currently I'm not trying to get my head around the issues of
modifying entities on the part bdry in parallel on different parts. One
could argue that this should be illegal, because someone's modifying a
non-owned entity. But in the end it's a useful enough special case that
we may need to support it anyway.
As nearly as I can tell currently, micro-migration can be done with
three rounds of messages, plus some intervening changes in the mesh data
for the various parts.
Round 1 (part O, for originating): Request migration of an entity.
This request may be aimed just at the entity ("I want to own that
vertex/edge/face on the part bdry") or at the entity and its
higher-order adjacencies ("I want to own that vertex and the star
incident on it (so I can remove it)" or "I want to own that face and the
region on its opposite side (so I can swap it)"). The originating part
keeps track of how messages it's sent, and where, so that it can tally
them as they come in.
Round 2 (part S, for sending): The part that owns the entity or the
part/parts that own the entity and its adjacencies send back information
about the migrated entity/ies, entities that are now on the part bdry
but weren't before, and entities that are now ghosts but weren't before.
Note that, in cases where multiple messages were sent, no modification
of the mesh on part S can be done, because another part may refuse the
migration; see below.
Round 3 (part O): The part that originated the migration request, once
it has replies from all parts it sent messages to, modifies its internal
data and sends back an update to part(s) S, telling them which handles
on part O correspond to which handles on part S. Now part(s) S can
update their own internal state, and everyone knows which local entities
are connected to which remote entities (on which parts).
Of course, it's possible that a part may get two migration requests for
the same entity. At this point, it's crucial that the first
migration-in-progress locks the entity, so that the second request can
be denied (with an imaginative error code like MIGRATION_IN_PROGRESS).
This is one of the points where locking is important to prevent various
race conditions...
Long as this message currently is, it's still a little sparse on detail,
compared with what's in my brain and notebook. I'm going to go ahead
and send it anyway, and work on a lovely sketch and a couple of
examples, which I'll try to get sent out before the telecon for discussion.
Carl
--
------------------------------------------------------------------------
Dr. Carl Ollivier-Gooch, P.Eng. Voice: +1-604-822-1854
Associate Professor Fax: +1-604-822-2403
Department of Mechanical Engineering email: cfog at mech.ubc.ca
University of British Columbia http://www.mech.ubc.ca/~cfog
Vancouver, BC V6T 1Z4 http://tetra.mech.ubc.ca/ANSLab/
------------------------------------------------------------------------
More information about the itaps-parallel
mailing list