itaps-parallel Micro-migration update

Tim Tautges tautges at mcs.anl.gov
Thu Mar 6 08:41:25 CST 2008


Ok, so I'm still mulling over how to express exchanging tag data between 
parts, and in doing so I've gone back and reviewed Carl's email below. 
In thinking about migration, the following occurs to me:

1. We already need the ability to send ghost entities between parts, and 
I'd argue the ability to remove ghost entities on a part
2. We should have the ability for an application to change the ownership 
of a given entity (at least, when it's shared between the before and 
after owning parts)

So, why not implement migration out of its constituent parts, e.g. first 
create ghosts on the destination processor, then change the ownership?

- tim


Carl Ollivier-Gooch wrote:
> Hello, all.
> 
> First, my apologies for another eleventh-hour email.  On Thursday and 
> Friday I had a midterm to set as well as various other unpleasant but 
> urgent bits of paperwork to do...
> 
> In the end, I've concluded that obsessing about detailed examples, 
> beyond the proof-of-concept stage, is probably not productive at this 
> point.  So instead I've tried to jump back up one level, conceptually.
> 
> As a reprise, before I talk about what things I currently think we 
> should be supporting and syntax sketches for that, let me quote from a 
> previous email I sent on the topic:
> 
> <quote>
> My basic premises are these:
> 
> 1.  Communication about mesh entities in parallel should be handled by 
> the parallel mesh interface, not by the application.  That is, the 
> application shouldn't be talking between processes about mesh-related 
> things (like adjacency or migration or whatever) without going through 
> the interface.
> 
> 2.  We want to avoid communication blocks as much as possible, and to 
> facilitate communication latency hiding as much as possible.  This means 
> I'm working in a paradigm where messages fly back and forth, and each 
> process has to poll for newly-received messages from time to time.  This 
> implies some mechanism for the implementation to inform the application 
> when some request has been completed, or when some entity isn't 
> available for local modification (i.e., locked) because communication is 
> ongoing.
> 
> 3.  All mesh entities on part bdrys will know the remote ID's (part + 
> handle) of all other copies of that entity.
> 
> 4.  All ghosted mesh entities will know the remote ID of their "master" 
> copy (the one that is actually owned by some other part), and that 
> "master" copy will know the remote ID's of all ghost copies.  I don't 
> (at present, anyway) see any need for a ghost copy to distinguish 
> between having a master copy that's on a part bdry versus in the 
> interior of a part.
> 
> 5.  Ghosting rules are known to the implementation, and can be described 
> as first- or second-order adjacencies of entities on the part bdry. More 
> complex communication patterns will likely prove difficult or impossible 
> to maintain during mesh modification, but even the simple ones will be 
> pretty painful.  We can return to this later.
> 
> </quote>
> 
> Basically, what I see is a situation where an application makes a 
> request for migration, etc, through the interface, and the 
> implementation handles that by sending a stream of messages back and 
> forth.  The content, type, and number of messages is up to the 
> implementation, but blocking communication should be (read: we'd really, 
> really like it to be) initiated only on some specific request by the 
> application.  So here are some requests that I'm confident we need and 
> that I'm reasonably confident I understand how to specify syntax for:
> 
> A.  Request in-migration of an entity.  This entity must be on the part 
> bdry and is identified by local handle, and the implementation handles 
> the rest.  If include_upward_adj is true, then stuff on the remote part 
> also gets migrates (-all- higher-dimensional entities).  This operation 
> will require multiple rounds of communication, and at some times certain 
> entities may be locked (unavailable for local modification) while info 
> about their remote copies is still in question.
> 
> prefix_migrateEntity(mesh_instance, partition_handle, 
> local_entity_handle, bool include_upward_adj)
> 
> B.  Update vertex coordinates.  One could argue that we could overload 
> the setVtxCoords function to do this, and maybe we should.  But that 
> obfuscates when communication could occur.  The communication here is 
> push-and-forget.
> 
> prefix_updateVtxCoords(mesh_instance, partition_handle, 
> local_vertex_handle)
> 
> C.  Poll for messages.  The internals of this function are going to have 
> to cover a lot of ground.  Not sure what the arg list should be, 
> exactly, but there should probably be return data to indicate that a 
> request has failed or succeeded.  Whether this is via a request handle 
> or simply returning entity handles is up in the air, as far as I'm 
> concerned.
> 
> prefix_pollForRequests(mesh_instance, partition_handle)
> 
> D.  Done with micro-migration.  This is a blocking call, to get 
> everything up-to-date and back in synch.  Essentially, waits for all 
> message traffic to clear, as well as (possibly) rebuilding a bunch of 
> ghost info that was allowed to go obsolete.
> 
> There are also several things that we probably need but are going to be 
> a bit trickier to specify.
> 
> 
> E.  Replace entities.  This could be on the part bdry or ghosts (that 
> just changed via swapping, for instance).  Passing in a list of local 
> entities that are defunct and replaced with other local entities, so 
> that info can be passed abroad, is more or less the idea.  But I can see 
> challenges in defining such a function clearly, either for updating what 
> a remote part thinks about neighbors, or for changes on the part bdry. 
> The latter case also introduces the awkwardness of things like creating 
> remote vertices (or matching up verts that were created independently 
> with each other).  I recognize that RPI has put forward functions for 
> this, which basically amount to acknowledging matching between 
> independently created ents, and maybe this is the way to go here.  I'm 
> not sure I see any answers I really like on this one, at least not yet.
> 
> 
> Okay, having said all that, I'm sure this will create several firestorms 
> in discussion :-), both for things I've included and things I've 
> omitted.  If nothing else, I'm pretty sure there are going to be 
> requests we'll need here that I haven't thought of yet.
> 
> Talk to you all in the morning,
> Carl
> 

-- 
================================================================
"You will keep in perfect peace him whose mind is
   steadfast, because he trusts in you."               Isaiah 26:3

              Tim Tautges            Argonne National Laboratory
          (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
          phone: (608) 263-8485      1500 Engineering Dr.
            fax: (608) 263-4499      Madison, WI 53706




More information about the itaps-parallel mailing list