itaps-parallel Phone conf today

Tue Jan 8 12:58:11 CST 2008

Hi Mark,

I'll chime in here because I've spent a lot of time myself struggling with
issues of data model versus API in other projects and have written some
about it (an intro to another project I once worked on is attached for more
detailed information if so desired). An excerpt addressing 'data model'
follows below.

I include the below excerpt because a) I think ITAPS defines a single data model,
b) I think being vigilant about how the choices we make either effect
the data model or are effected by the data model is very important,
c) its NOT my impression that different players in the ITAPS game have different
data models but instead are employing the same data model differently to
achieve a similar end (e.g. one uses tags on entities to embue entities
with various semantics while another uses sets and set membership to
embue entities with various semantics) and d) to the extent possible, I'd
like to see ITAPS develop the essential, minimum, core functionality on top
of which all other operations can run with acceptable storage and execution
performance.

BEGIN EXCERPT

An important feature of a scientific data support library is whether it is
'model-oriented' or 'menu-oriented.' A menu-oriented system such as Silo or Exodus II,
offers what amounts to a menu of objects. From among the objects on the menu,
a user chooses an object that most closely matches her data. If the user cannot
find an appropriate object, her choices are to ask the "chef" to cook up a new
object (that is, extend the system) or eat at a different restaraunt (that is,
for example, switch from Exodus II to Silo).

[I'll interject here that most of the APIs I am familiar with dealing with
 either geometric modeling and/or discrete mesh representations are
 menu oriented which is why there are so many 'restaraunts' now to eat
 at and a key barrier to large scale integration of scientific computing
 software]

There are unique API calls to handle each object in a menu-oriented system.
Therefore, as a menu-oriented system is employed to support a wider variety
of applications and reach larger and larger scales of integration, its API
and system complexity grow right along with its menu. This is a key
consequence of a menu-oriented approach.

By contrast, a model-oriented system offers a small set of primitive building
blocks. A user assembles these building blocks to literally model her data.
The same set of building blocks can be assembled in a variety ways to model
a wide variety of data. As a model-oriented system is employed to reach larger
and larger scales of integration, its key advantage over a menu-oriented system
is that the API and system complexity remain fixed.

On the other hand, a model-oriented system does have some drawbacks.
Learning to use a menu-oriented system is relatively simple. It involves
browsing the menu (e.g the API reference manual) for an object that most
closely matches the data in an application and then learning the handful
of API calls (often just one) to marshal that object through the API.

However, learning to use a model-oriented system is more involved.
First, the primitive building blocks of a model-oriented system are
more abstract, less familiar and require greater time in learning to
use than the objects in a menu-oriented system. Second, it is often the
case that a user cannot get by with learning only a portion of a
model-oriented API. To do anything useful in a model-oriented system,
a user needs to become familar with a majority of the API. Third, the
user needs to learn how to apply the modeling primitives to build up
'expressions' for his or her data in terms of assemblies of these
primitives. In other words, the user has to become a data modeler.
Furthermore, each time the user encounters new data s/he has not dealt
with before, s/he has to re-engage in the modeling activity to decide
how best to represent it with the given primitives.

For these reasons, using a model-oriented
system is not anything like using a menu-oriented systems like Silo,
Exodus II or similar products.

Another useful way of thinking about the difference between using menu-
and model-oriented systems is that using menu-oriented systems is like
answering multiple choice questions while using model-oriented systems
is like answering essay questions. In fact, a good way to think about 
a model-oriented system is that it is like a 'grammar' designed specifically
for descriptions of scientific data. We call that 'grammar' a data model.

The design of a model-oriented system is usually decomposed into a few key
pieces; its data model specification, its API specification and its implementation.
The data model specification is a white paper, natural language specification of
the various model primitives and their attributes. The data model specification
is highely technical in nature and defines the foundations of the data model.
I believe chapter 3 of TSTTM interface describes the data model.

The API specification is a list of functions that generally map 1:1 to modeling
primitives and/or attributes. I believe the iMesh.h file is the API specification.
The implementation is, of course, the source code that implements the specification.

The reason for explaining the distinction between these parts of a design is that
the data model specification generally should always lead the API specification which,
in turn, should always lead the implementation. This has two consequences. First, if
new functionality outside the scope of the current design becomes necessary, it is
first addressed in the data model, then in the API and finally in the implementation.
Second, it can be the case that the data model suggests functionality the current
API spec. and implementation do not support. So, a would-be user should not be surprised
if upon learning the data model, s/he encounters limitations in the current API or
implementation s/he did not expect. Likewise, the would-be user should not be surprised
if the API specification includes functionality that is not available in the current
implementation.

END EXCERPT

Mark Shephard wrote:
> 
> In yesterdays phone call we yet again went back to the discussion on
> data model vs API. As near as I can tell iMesh, iGeom, etc. are API?s
> where there are agreed to functions that take specific information as
> input and provide specific information as answers to the question. For
> example, the one we keep discussing is give me the group of mesh faces
> that meet two conditions (one example, but only one example, of which is
> the mesh faces in a specific part and has a specific BC).  This can be
> discussed at the API level quite independently of one does it with
> carrying out Booleans of sets of does something else underneath to
> provide the answer to the question.
> 
> The geometric modeling tools have done this well for many years focused
> on the API end of it and no one has problems with it.
> 
> Since it is quite clear that different ITAPS groups have very different
> data models for how they do thing which are actually driven by valid
> concepts, that have also been discussed many times before, can we please
> focus on the API level and quite telling each other we have to adopt a
> specific data model.
> 
> Mark
> 
> Devine, Karen D. wrote:
> > My calendar shows that we have a phone conference scheduled for today at
> > 1:30pm PST.  Please let me know if you expect to miss this meeting (or if
> > you think my calendar is incorrect).
> >
> > There has been lots of good discussion since our last meeting.  Please take
> > time to review it before the phone conference.
> >
> > Below are topics for discussion.
> >
> > -  Tim asked that we clarify the questions we are addressing.  He listed the
> > following:
> >
> >     1) Do we want to do booleans on query results below the interface level?
> >      An example query would be "What are all regions in some set in a
> >     particular part?"
> >
> >     2) How do we express and query sets spread across processors?
> >
> >     3) How do we refer to parts and partitions in the interface, and which
> >     functions do we use to get information about them?
> >
> > So far, our efforts have been addressing (3) and, more recently, (1).  With
> > respect to (3), we have agreed to provide helper-functions that expose
> > partition and part information through the interface.  These
> > helper-functions are natural for an application to use and allow the
> > greatest flexibility for the implementations.  With respect to (1), we
> > identified the need to answer queries such as the example Tim provided.
> > Several options were proposed to answer these queries (see below).  Tim
> > noted that similar capability could be useful for intersections of multiple
> > entity sets.  We have not deeply explored question (2) yet, and unless there
> > is a reason to do so immediately, I'd like to resolve (3) and (1) before we
> > start into (2).
> >
> > -  I'd like to try to reach agreement on a solution to question (1) above.
> > Our original idea of overloading the mesh instance argument is insufficient
> > for answering these queries, especially in a multiplexing environment.  Carl
> > summarized three options; I'd like to discuss the pros/cons of each.
> >
> >> a)  Adding args to about 10 functions, for both serial and parallel.
> >> Requires: universal agreement about what a null handle is, or else
> >> definition of an implementation-specific global variable that apps can
> >> use to find out what the null handle is for that impl.  Also, for every
> >> call, the function has to identify which args it was passed and dispatch
> >> accordingly (creating more functions internal to the interface,
> >> probably, but not user-visible).  Changes to serial code.
> >>
> >> b)  Leaving those 10 functions alone in serial, and adding parallel
> >> versions.  Requires: adding probably one function for each of the
> >> originals in the parallel interface.  My guess from the current document
> >> on parallel is that we'll be looking at >100 functions already, so this
> >> may or may not be considered significant. Serial code left untouched.
> >>
> >> c)  Adding some magic to squish three args into one in parallel.
> >> Requires: A function to squish (and unsquish, internally) the args;
> >> therefore, extra effort for the app and impl for each of those 10
> >> functions every time they're called (even in serial, probably).
> >> Depending on how the squishing is done, risk of handle collision.
> >> Changes to serial code.
> >
> > -  Finally, I'd like to get clarity on how decisions are made in ITAPS.
> > When the iMesh interface was defined, for example, was unanimous agreement
> > needed on a function before it was accepted?  Or was a vote taken (as I
> > believe CCA does)?
> >
> >

-- 
Mark C. Miller, Lawrence Livermore National Laboratory
email: mailto:miller86 at llnl.gov
(M/T/W) (925)-423-5901 (!!LLNL BUSINESS ONLY!!)
(Th/F)  (530)-753-8511 (!!LLNL BUSINESS ONLY!!)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: intro_to_saf.pdf
Type: application/pdf
Size: 86865 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/itaps-parallel/attachments/20080108/df829804/attachment.pdf>