itaps-parallel Micro-migration update

Thu Mar 6 08:53:19 CST 2008

Tim,

If you study the algorithms developed for doing the things needed you 
would see that doing it this way would add a lot of overhead to 
processes that are already communication bound. The key process is 
migration, ghosting is the secondary one and its needs should not drive 
how migration is done.

Mark

Tim Tautges wrote:
> Ok, so I'm still mulling over how to express exchanging tag data between 
> parts, and in doing so I've gone back and reviewed Carl's email below. 
> In thinking about migration, the following occurs to me:
> 
> 1. We already need the ability to send ghost entities between parts, and 
> I'd argue the ability to remove ghost entities on a part
> 2. We should have the ability for an application to change the ownership 
> of a given entity (at least, when it's shared between the before and 
> after owning parts)
> 
> So, why not implement migration out of its constituent parts, e.g. first 
> create ghosts on the destination processor, then change the ownership?
> 
> - tim
> 
> 
> Carl Ollivier-Gooch wrote:
>> Hello, all.
>>
>> First, my apologies for another eleventh-hour email.  On Thursday and 
>> Friday I had a midterm to set as well as various other unpleasant but 
>> urgent bits of paperwork to do...
>>
>> In the end, I've concluded that obsessing about detailed examples, 
>> beyond the proof-of-concept stage, is probably not productive at this 
>> point.  So instead I've tried to jump back up one level, conceptually.
>>
>> As a reprise, before I talk about what things I currently think we 
>> should be supporting and syntax sketches for that, let me quote from a 
>> previous email I sent on the topic:
>>
>> <quote>
>> My basic premises are these:
>>
>> 1.  Communication about mesh entities in parallel should be handled by 
>> the parallel mesh interface, not by the application.  That is, the 
>> application shouldn't be talking between processes about mesh-related 
>> things (like adjacency or migration or whatever) without going through 
>> the interface.
>>
>> 2.  We want to avoid communication blocks as much as possible, and to 
>> facilitate communication latency hiding as much as possible.  This 
>> means I'm working in a paradigm where messages fly back and forth, and 
>> each process has to poll for newly-received messages from time to 
>> time.  This implies some mechanism for the implementation to inform 
>> the application when some request has been completed, or when some 
>> entity isn't available for local modification (i.e., locked) because 
>> communication is ongoing.
>>
>> 3.  All mesh entities on part bdrys will know the remote ID's (part + 
>> handle) of all other copies of that entity.
>>
>> 4.  All ghosted mesh entities will know the remote ID of their 
>> "master" copy (the one that is actually owned by some other part), and 
>> that "master" copy will know the remote ID's of all ghost copies.  I 
>> don't (at present, anyway) see any need for a ghost copy to 
>> distinguish between having a master copy that's on a part bdry versus 
>> in the interior of a part.
>>
>> 5.  Ghosting rules are known to the implementation, and can be 
>> described as first- or second-order adjacencies of entities on the 
>> part bdry. More complex communication patterns will likely prove 
>> difficult or impossible to maintain during mesh modification, but even 
>> the simple ones will be pretty painful.  We can return to this later.
>>
>> </quote>
>>
>> Basically, what I see is a situation where an application makes a 
>> request for migration, etc, through the interface, and the 
>> implementation handles that by sending a stream of messages back and 
>> forth.  The content, type, and number of messages is up to the 
>> implementation, but blocking communication should be (read: we'd 
>> really, really like it to be) initiated only on some specific request 
>> by the application.  So here are some requests that I'm confident we 
>> need and that I'm reasonably confident I understand how to specify 
>> syntax for:
>>
>> A.  Request in-migration of an entity.  This entity must be on the 
>> part bdry and is identified by local handle, and the implementation 
>> handles the rest.  If include_upward_adj is true, then stuff on the 
>> remote part also gets migrates (-all- higher-dimensional entities).  
>> This operation will require multiple rounds of communication, and at 
>> some times certain entities may be locked (unavailable for local 
>> modification) while info about their remote copies is still in question.
>>
>> prefix_migrateEntity(mesh_instance, partition_handle, 
>> local_entity_handle, bool include_upward_adj)
>>
>> B.  Update vertex coordinates.  One could argue that we could overload 
>> the setVtxCoords function to do this, and maybe we should.  But that 
>> obfuscates when communication could occur.  The communication here is 
>> push-and-forget.
>>
>> prefix_updateVtxCoords(mesh_instance, partition_handle, 
>> local_vertex_handle)
>>
>> C.  Poll for messages.  The internals of this function are going to 
>> have to cover a lot of ground.  Not sure what the arg list should be, 
>> exactly, but there should probably be return data to indicate that a 
>> request has failed or succeeded.  Whether this is via a request handle 
>> or simply returning entity handles is up in the air, as far as I'm 
>> concerned.
>>
>> prefix_pollForRequests(mesh_instance, partition_handle)
>>
>> D.  Done with micro-migration.  This is a blocking call, to get 
>> everything up-to-date and back in synch.  Essentially, waits for all 
>> message traffic to clear, as well as (possibly) rebuilding a bunch of 
>> ghost info that was allowed to go obsolete.
>>
>> There are also several things that we probably need but are going to 
>> be a bit trickier to specify.
>>
>>
>> E.  Replace entities.  This could be on the part bdry or ghosts (that 
>> just changed via swapping, for instance).  Passing in a list of local 
>> entities that are defunct and replaced with other local entities, so 
>> that info can be passed abroad, is more or less the idea.  But I can 
>> see challenges in defining such a function clearly, either for 
>> updating what a remote part thinks about neighbors, or for changes on 
>> the part bdry. The latter case also introduces the awkwardness of 
>> things like creating remote vertices (or matching up verts that were 
>> created independently with each other).  I recognize that RPI has put 
>> forward functions for this, which basically amount to acknowledging 
>> matching between independently created ents, and maybe this is the way 
>> to go here.  I'm not sure I see any answers I really like on this one, 
>> at least not yet.
>>
>>
>> Okay, having said all that, I'm sure this will create several 
>> firestorms in discussion :-), both for things I've included and things 
>> I've omitted.  If nothing else, I'm pretty sure there are going to be 
>> requests we'll need here that I haven't thought of yet.
>>
>> Talk to you all in the morning,
>> Carl
>>
>