itaps-parallel Micro-migration update

Thu Mar 6 12:12:41 CST 2008

A couple of quick comments.

Carl Ollivier-Gooch wrote:
> Tim Tautges wrote:
>> Ok, so I'm still mulling over how to express exchanging tag data 
>> between parts, and in doing so I've gone back and reviewed Carl's 
>> email below. In thinking about migration, the following occurs to me:
>>
>> 1. We already need the ability to send ghost entities between parts, 
>> and I'd argue the ability to remove ghost entities on a part
>> 2. We should have the ability for an application to change the 
>> ownership of a given entity (at least, when it's shared between the 
>> before and after owning parts)
>>
>> So, why not implement migration out of its constituent parts, e.g. 
>> first create ghosts on the destination processor, then change the 
>> ownership?
> 
> Tim, I think your question is motivated by thinking about the push 
> migration, right?  In that scenario, I can see doing two calls as suggest:
> 
> 1.  Push this entity as a ghost (A: sends to B; B: confirms local handle 
> to A)
> 
> 2.  Push ownership to B (A: sends ownership to B)
> 
> That's one message more than you'd get with a push migration, but maybe 
> that's okay, especially in bulk.  So maybe for things like load balance 
> / re-partition this is a reasonable choice (though I'm pretty sure it'll 
> be slower and I -think- it actually increases the number of functions in 
> the interface...  ghosting and migration can be distinguished by a flag...)
> 
> But in any case, I see the most common scenario for micro-migration as 
> being a pull:  a process wants to operate on something it owns but needs 
> to own a star around it as well.  In this scenario, if you don't already 
> -have- the ghost, you don't know what to request, so the process would 
> look something like:
> 
> 1.  Request a ghost star (A: sends to B; B: sends data; A: confirms 
> local handles to B)
> 
> 2.  Request ownership change (A: requests; B: confirms (necessary in 
> case some other part beats A to it...)).

This is correct - it is a pull.

> 
> Whether or not you agree that my outline is capable of handling complex 
> scenarios, I don't see how the two-request version is going to be 
> competitive with the one-request version for pull-migration.  So the 
> syntax I propose below doesn't include this.

Considering that mesh migration is communication dominate even on a Blue 
Gene, doubling the number of communication steps is a real bad thing.

> 
> Note:  I haven't tried to come up with good, compact names at this 
> point, nor am I yet proposing syntax for array versions.
> 
> A.  Request in-migration of an entity (this is a pull migration).  This
> entity must be on the part bdry and is identified by local handle, and
> the implementation handles the rest.  If include_upward_adj is true,
> then stuff on the remote part also gets migrates (-all-
> higher-dimensional entities).  This operation will require multiple
> rounds of communication, and at some times certain entities may be
> locked (unavailable for local modification) while info about their
> remote copies is still in question.
> 
> void prefix_migrateEntity(iMesh_Instance instance,
>               const prefix_PartitionHandle partition_handle,
>                   const entity_handle local_entity_handle,
>                   bool include_upward_adj, int *err);
> 
> B.  Update vertex coordinates.  One could argue that we could overload
> the setVtxCoords function to do this, and maybe we should.  But that
> obfuscates when communication could occur.  The communication here is
> push-and-forget.
> 
> void prefix_updateVtxCoords(iMesh_Instance instance,
>                 const prefix_PartitionHandle partition_handle,
>                     const entity_handle local_vertex_handle,
>                 int *err);
> 
> C.  Poll for messages.  The internals of this function are going to have
> to cover a lot of ground.  The array in the return is there as a
> placeholder to tell the application that something interesting / useful
> has been done to a handle.  This might indicate successful in-migration,
> a recent change in vertex location, or successful completion of handle
> matching.
> 
> void prefix_pollForRequests(iMesh_Instance instance,
>                 const prefix_PartitionHandle partition_handle,
>                     entity_handle **handles_available,
>                 int *handles_allocated,
>                 int *handles_size,
>                 int *err);
> 
> D.  Done with micro-migration.  This is a blocking call, to get
> everything up-to-date and back in synch.  Essentially, waits for all
> message traffic to clear, as well as (possibly) rebuilding a bunch of
> ghost info that was allowed to go obsolete.
> 
> void prefix_synchParts(iMesh_Instance instance,
>                const prefix_PartitionHandle partition_handle,
>                int *err);
>        
> E.  Replace entities.  This refers to changes on the part bdry where the
> application/service is responsible for ensuring that things are done
> identically on both sides and that the args are passed in an order that
> can be matched.  (Specifically, matching new entities should appear in
> the same order in the call array.)  Communication here could be a
> two-way push-and-forget, or some variant on push-and-confirm.
> 
> void prefix_replaceOnPartBdry(iMesh_Instance instance,
>                const prefix_PartitionHandle partition_handle,
>                const entity_handle *old_entities,
>                const int old_entities_size,
>                const entity_handle *new_entities,
>                const int new_entities_size,
>                int *err);
> 
> F.  As Tim suggests, the ability to create and delete ghosts is likely
> to be useful, even though for common topologically-based cases, ghost
> maintainence can (and IMO should) be handled automagically, either at
> migration time or during prefix_synchParts.  Communication here is
> push-and-confirm for creation (so that the original knows ID's of the
> ghosts), push-and-forget for deletion.  I'm assuming here that the
> closure of a new ghost will be pushed automatically as part of the
> underlying communication, and that the remote part will clean up the
> closure as appropriate during deletion.  Finally, note that createGhost 
> could easily be tweaked to handle a micro-push migration: change the 
> name and add a flag.
> 
> void prefix_createGhost(iMesh_Instance instance,
>                const prefix_PartitionHandle partition_handle,
>                const prefix_PartHandle target_part,
>                const entity_handle ghost_to_push,
>                int *err);
> 
> void prefix_deleteGhostOf(iMesh_Instance instance,
>                const prefix_PartitionHandle partition_handle,
>                const prefix_PartHandle target_part,
>                const entity_handle ghost_to_purge,
>                int *err);
> 
> I think that'll do for now.  At least we've got a starting point for 
> iteration. :-)
> 
> Carl
> 
>