itaps-parallel Micro-migration update
Carl Ollivier-Gooch
cfog at mech.ubc.ca
Mon Mar 3 00:00:31 CST 2008
Hello, all.
First, my apologies for another eleventh-hour email. On Thursday and
Friday I had a midterm to set as well as various other unpleasant but
urgent bits of paperwork to do...
In the end, I've concluded that obsessing about detailed examples,
beyond the proof-of-concept stage, is probably not productive at this
point. So instead I've tried to jump back up one level, conceptually.
As a reprise, before I talk about what things I currently think we
should be supporting and syntax sketches for that, let me quote from a
previous email I sent on the topic:
<quote>
My basic premises are these:
1. Communication about mesh entities in parallel should be handled by
the parallel mesh interface, not by the application. That is, the
application shouldn't be talking between processes about mesh-related
things (like adjacency or migration or whatever) without going through
the interface.
2. We want to avoid communication blocks as much as possible, and to
facilitate communication latency hiding as much as possible. This means
I'm working in a paradigm where messages fly back and forth, and each
process has to poll for newly-received messages from time to time. This
implies some mechanism for the implementation to inform the application
when some request has been completed, or when some entity isn't
available for local modification (i.e., locked) because communication is
ongoing.
3. All mesh entities on part bdrys will know the remote ID's (part +
handle) of all other copies of that entity.
4. All ghosted mesh entities will know the remote ID of their "master"
copy (the one that is actually owned by some other part), and that
"master" copy will know the remote ID's of all ghost copies. I don't
(at present, anyway) see any need for a ghost copy to distinguish
between having a master copy that's on a part bdry versus in the
interior of a part.
5. Ghosting rules are known to the implementation, and can be described
as first- or second-order adjacencies of entities on the part bdry.
More complex communication patterns will likely prove difficult or
impossible to maintain during mesh modification, but even the simple
ones will be pretty painful. We can return to this later.
</quote>
Basically, what I see is a situation where an application makes a
request for migration, etc, through the interface, and the
implementation handles that by sending a stream of messages back and
forth. The content, type, and number of messages is up to the
implementation, but blocking communication should be (read: we'd really,
really like it to be) initiated only on some specific request by the
application. So here are some requests that I'm confident we need and
that I'm reasonably confident I understand how to specify syntax for:
A. Request in-migration of an entity. This entity must be on the part
bdry and is identified by local handle, and the implementation handles
the rest. If include_upward_adj is true, then stuff on the remote part
also gets migrates (-all- higher-dimensional entities). This operation
will require multiple rounds of communication, and at some times certain
entities may be locked (unavailable for local modification) while info
about their remote copies is still in question.
prefix_migrateEntity(mesh_instance, partition_handle,
local_entity_handle, bool include_upward_adj)
B. Update vertex coordinates. One could argue that we could overload
the setVtxCoords function to do this, and maybe we should. But that
obfuscates when communication could occur. The communication here is
push-and-forget.
prefix_updateVtxCoords(mesh_instance, partition_handle, local_vertex_handle)
C. Poll for messages. The internals of this function are going to have
to cover a lot of ground. Not sure what the arg list should be,
exactly, but there should probably be return data to indicate that a
request has failed or succeeded. Whether this is via a request handle
or simply returning entity handles is up in the air, as far as I'm
concerned.
prefix_pollForRequests(mesh_instance, partition_handle)
D. Done with micro-migration. This is a blocking call, to get
everything up-to-date and back in synch. Essentially, waits for all
message traffic to clear, as well as (possibly) rebuilding a bunch of
ghost info that was allowed to go obsolete.
There are also several things that we probably need but are going to be
a bit trickier to specify.
E. Replace entities. This could be on the part bdry or ghosts (that
just changed via swapping, for instance). Passing in a list of local
entities that are defunct and replaced with other local entities, so
that info can be passed abroad, is more or less the idea. But I can see
challenges in defining such a function clearly, either for updating what
a remote part thinks about neighbors, or for changes on the part bdry.
The latter case also introduces the awkwardness of things like creating
remote vertices (or matching up verts that were created independently
with each other). I recognize that RPI has put forward functions for
this, which basically amount to acknowledging matching between
independently created ents, and maybe this is the way to go here. I'm
not sure I see any answers I really like on this one, at least not yet.
Okay, having said all that, I'm sure this will create several firestorms
in discussion :-), both for things I've included and things I've
omitted. If nothing else, I'm pretty sure there are going to be
requests we'll need here that I haven't thought of yet.
Talk to you all in the morning,
Carl
--
------------------------------------------------------------------------
Dr. Carl Ollivier-Gooch, P.Eng. Voice: +1-604-822-1854
Associate Professor Fax: +1-604-822-2403
Department of Mechanical Engineering email: cfog at mech.ubc.ca
University of British Columbia http://www.mech.ubc.ca/~cfog
Vancouver, BC V6T 1Z4 http://tetra.mech.ubc.ca/ANSLab/
------------------------------------------------------------------------
More information about the itaps-parallel
mailing list