itaps-parallel comments on today's meeting

Fri Nov 16 02:21:51 CST 2007

--
Mark C. Miller, Lawrence Livermore National Laboratory
email: mailto:miller86 at llnl.gov
(M/T/W) (925)-423-5901 (!!LLNL BUSINESS ONLY!!)
(Th/F)  (530)-753-8511 (!!LLNL BUSINESS ONLY!!)

-------------- next part --------------
Karen,

I wanted to mention a few things about the parallel functionality model
that we discussed briefly today.

There is no particular order to these ramblings...

I think the term "MPI Process" is used most often in the community
to describe a complete parallel application running on many processors
and not an individual process of that application. Wherever you use 
"MPI Process" I think the community tends to use "MPI Task"

I think it is important whenever we talk about "mesh" to distinguish
between a piece of a mesh that has been carved out, somehow, from an
aggregate (global) whole to be distributed to processors and the
aggregate (global) mesh itself. I am unaware of any community wide
terminology for this. We're already using terms like 'part' for related
(but different) purposes.  I think 'local-mesh' and 'global-mesh'
might help to make things clearer. 

On a philisophical note, I think it is important to provide a high
enough level of abstraction in the data model that even the fact that
a given mesh is decomposed for parallel processing is itself really an
'implementation' detail. I say this to emphasize that I think the meaning
of the word "mesh" ought to always mean the single, global, aggregate
whole thing.

In the terminology section, I inventoried the different names (nouns)
of stuff I saw there and wound up with...
    MPI-process (task)
    mesh
    mesh instance
    partition
    partition instance (iPartition)
    global IDs
    MPI communicator
    part
    part ID
    part handle
    Objects
    collection of objects
    entity
    group of entities
    entity sets, (mesh/es <==> partitions/parts)
    Lower-dimension entities
    root set
I list these only because it can be, at times, a bit difficult to
juggle them all. I am particulary concerned about having to
maintain separate notions for 'ID', 'handle' as well as 'objects'
'collections of objects' and 'group of entities' and maybe even
'entity sets'. It is not clear to me that these are or need to
be different things.

Also, the use of 'MPI' here is a bit too implementation specific in
my opinion too. Will we not ever support OpenMP applications? How
about PVM? Or, how about a mixture of OpenMP/MPI (that is very likely
on the new multi-core chips that might have 32 or 64 cores on a
single chip -- openMP on chip, MPI across chip). I think if we
stick to the term "parallel task" as a name for a wholly indpendent
piece of executing application code, then we can talk about the
number of these tasks as well as the "rank" of any one task without
specificly mentioning MPI. We can also talk about the "mappings"
between partitions and parallel tasks without specifically invoking
the notion of an MPI communicator. I think there is value in trying
to abstract away from MPI.

I think it is important to understand in detail Tim's insistence that
there should be only one iMesh_instance object per 'parallel task'
even to the point of aglomerating multiple different meshes of 
an application into that iMesh_instance and then distinguishing
between them via a multiplexer. I think I understand some of his
rationale for this but not all of it. And, I am sure I don't
understand how multiplexing helps as my understanding is that
the multiplexer is the iMesh instance (at least in Jason's example
he sent us).

I'll re-iterate here my metaphor for distinguishing between the
'active' partition of a mesh and various alternate partitions
that might be available. I suggested thinking of it in terms of
a 2D mesh as a sheet of paper onto which you could trace three
different sets of lines (or curves) in red, green and then blue.
Taken individually, each set of colored lines divides the paper
into pieces that could be distributed to processors. However, at
any one moment in the execution of a parallel application, that
sheet of paper is divided into pieces and instantated in the
parallel tasks in ONLY ONE of the three ways. It should be
possible, somehow, to change from say the red way to
the blue way and that means that, somehow, the ITAPS software
must maintain knowledge of the red, blue and green ways.
In addition, it should be possible to create, on the fly, a
new way to divide the sheet of paper, tell itaps about that
way and switch to it as well. The notion of 'active' partition
I've heard some participants use refers to the instantiated
state of the sheet of paper in a running application. The
notion of supporting 'multiple' partitions I've also heard
refers to that (minimum) knowledge necessary to change from
one instantiated state to another. It is not even clear to me
that ITAPS has to be responsible for maintaining that knowledge.
As long as an application can give ITAPS the essential bootstrap
(that minium knowledge) to change the instantiated state of
the mesh, that might be all that is necessary.

We discussed whether or not, for example, the call to ask about
which 'parts' are assigned to which 'parallel tasks' would be
collective or not. Together, the ability to have one processor
query ITAPS software and obtain knowledge of parts, partition,
entities, whatever, that have been created, destroyed or
otherwise manipulated by other processors implies one of two
things; either the query to obtain that information is a 
collective, global call to the ITAPS libraries which all
processors must agree to call or, there is a process by which
information necessary to service such calls can be 'synced
up' across processors making later queries local operations.
There are advantages and disadvantages to both. Jason made the
excellent argument that if the information necessary to
service a given query is really large, size O(N) where 'N' is the
number of processors, then the sync approach would force us
to have to cache this potentially huge data on each processor
whether or not the application will ever actually query it.
I think you made the observation that is probably some
queries that application might legitimately expect to be
able to query locally without having to engage in global
collective communication to obtain akin to MPI_Comm_size()
and MPI_Comm_rank() queries of MPI. So, I think we argued
that potentially both approaches would be necessary.
After we concluded this discussion, I found that there were
many queries that fit into one of these two kinds in your
Mesh functionality section that we neglected to discuss. I would
like to ensure we touch bases on that in a near future meeting.

I observed that we appear to have inverse and forward mapping
information in part/partition context as well as entity/entity set
context. Maybe in iGeom too. So, we discussed the utility of
something like ForwardQueryHint() and InverseQueryHint() functions
that an implemtation may honor to build up, internally, the information
necessary to service queries one direction or the other.

I would like to suggest that even if we don't have clearly defined
APIs, we could actually start writing pseudo-code for a couple
of uses cases using the 'functionalities' you have defined. I
would like to suggest we start with Mark S. simple use case and
try to write pseudo code for that (again using your functionality
definitions) to see what we can learn. I think such an exercise
would be enlightening.

Mark