itaps-parallel One particularly difficult thing in one of the use cases...

Wed Jun 2 07:30:00 CDT 2010

On 06/01/2010 11:32 PM, Carl Ollivier-Gooch wrote:
> On 10-06-01 08:52 PM, Tim Tautges wrote:
>> Hi all,
>> Here, I attempt to describe one of the particularly difficult parts of
>> one of the use cases. I think this at least exposes a potential problem
>> in the currently specified iMeshP; I think it also demonstrates the
>> problem with entities always having to be in parts, but that's more of a
>> subjective statement.
>>
>> The use case is radiation transport. In a nutshell, consider processes
>> arranged in a 2d array, with each column representing an angle, and each
>> row a subdomain in the spatial partition. The spatial partition of the
>> mesh will be distributed across all processes in a column; a given
>> spatial subdomain is copied onto each process in a row.
>
> Oh, now I understand (at least sort of) what's going on with this case...
>
>> Initializing the mesh onto this processor arrangement is done in 3 steps:
>> 1. load the mesh onto the 1st column, in a domain-decomposed fashion
>> typical of other DD-based codes
>> 2. share the mesh from the 1st column across a row of processes
>> 3. establish sharing relations in the other columns
>>
>> It's in step 3 that I see the problem. Here, the mesh for a spatial
>> subdomain is already represented in the iMesh Instance on each process,
>> but the Partition representing the column hasn't been created yet. After
>> you create that Partition, and a Part in that Partition, you need to
>> assign the mesh from the Part in the row-based Partition into that Part
>> in the column-based Partition. How do you do that? The function
>> iMeshP_exchEntArrToPartsAll is a collective call, and implies that
>> you're moving entities from one Part to another. But, the entities we're
>> talking about here aren't part of any Part in that column Partition. I'd
>> prefer to use iMesh_addEntToSet, but I'm guessing that would break other
>> implementations.
>
> In fact, as I read it, the spec says specifically that iMesh_addEntToSet
> gets used here, with a part handle passed as the set. This is no
> different, as I see it, from any other situation in which you're
> creating a partition rather than reading one. And yes, it's true that in
> the transition of setting up this (or any similar) partition, there's
> necessarily a limbo period in which entities haven't yet been assigned
> to parts. IMO, this is different than creating entities with no part
> assignment and leaving them in that state.
>

So this limbo period is allowed in your world?  This is subtly different from the stricter requirement that there be no 
time when an entity is not in a Part.  Mark S, are you sure you agree with this?

> This is a common enough scenario that I'm sure FMDB and Simmetrix have
> ways of handling this, even though new entities created once a partition
> is established are created in a part. (For the GRUMMP implementation
> that's on the drawing board, there's no problem here, either.)
>
> My issue with the whole "entities must belong to a part" thing is a
> consistency argument. I expect that, if we don't require all entities to
> be in parts, at least at the end of parallel services, some subsequent
> parallel operations will not be properly defined. I don't see that
> outcome as reasonable. Consider:
>
> Premise 1 (written in the spec): An entity -must- have an owner if it's
> going to be modified. (Ownership == right to modify)
>
> Premise 2: It's never safe for a service to assume that no other service
> will ever modify the mesh after it's done, nor that other services
> (whether or not they modify the mesh) can tolerate partless entities,
> even if they can be identified as such. (Ghosting patterns could change,
> for instance, making it so that entities that weren't ghosted now are,
> and need an owner for that reason.)
>
> If those two premises hold up (and I obviously think they do), any
> parallel service that wants to work and play well with others needs to
> assign all entities to parts before completion.

Well, any services that modify mesh, anyway.  There are one or two services working with mesh that don't modify it, you 
know.  In making this interface work for the mesh modifying services, we shouldn't cripple those other ones too badly.

Yes, I think this even
> applies to the parallel meshing scenario we discussed in the telecon:
> the parallel version of this must, IMO, at the very least post-process
> the mesh that the parallel-unaware mesher produced to get stuff into
> parts. This would be a wrapper, not a mod to the mesher itself.
>

I do agree with most of the above points.  But, that just imposes requirements on the state of the mesh after a 
parallel-aware service is done.

> As an aside, if an entity isn't explicitly assigned to a part, what
> should a function like getEntOwnerPart return? And does it pass the
> giggle test for this to -not- return the one part on a process when
> there is only one? And if unowned parts get an inferred value in this
> function, how does one do that in the presence of multiple parts per
> process and still avoid giggling?
>

That brings up an excellent point - what does getOwnerPartArr return for entities not owned by any Part *in a 
Partition*?    Looks like we either need a stricter requirement, or we need a notion of some Part ID that's considered 
invalid.  Otherwise, it's invalid to create a Partition and Parts and add entities to those parts in a subsequent operation.

- tim

>  From a (planned) implementation point of view, I don't think I'm really
> going to care whether entities are assigned to the right part initially,
> assigned to an arbitrary part and moved later, or not assigned to a part
> initially and assigned later. These should all be cheap in time (O(1)
> with a small constant) and space (no more than an int per entity).
>
>> This example also demonstrates the need either for another function, to
>> negotiate shared entities between Parts, or to expand the definition of
>> iMeshP_createGhostEntsAll to include the functionality (the latter would
>> be most natural for MOAB, since the same function is used in MOAB's
>> parallel stuff to do either; I distinguish by allowing the # layers
>> specified to be zero, in which case you're requesting the resolution of
>> shared entities at an interface).
>
> I agree this functionality is needed when any mesh is partitioned
> instead of being read in parallel.
>
> Carl
>

-- 
================================================================
"You will keep in perfect peace him whose mind is
   steadfast, because he trusts in you."               Isaiah 26:3

              Tim Tautges            Argonne National Laboratory
          (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
          phone: (608) 263-8485      1500 Engineering Dr.
            fax: (608) 263-4499      Madison, WI 53706