[Petsc-trilinos-discussion] Scope and requirements

Fri Nov 22 10:06:06 CST 2013

Sorry for the long email ...

In Trilinos, the basic concept is that all objects should print to an arbitrary Teuchos::FancyOStream object (which takes any arbitrary std::ostream object).  The underlying std::ostream object has an abstract stream buffer object that can be overridden to send output anywhere.  The class Teuchos::FancyOStream contains some utilities to improve the formatting of the output (like adding indentation in a nice way, adding prefixes for the procID, etc.) and has other little features (but has no real dependence on MPI or anything else).  But the real flexibility is the underlying std::ostream object with its abstract subclass of std::basic_streambuf object.  You can define subclasses of std::basic_streambuf to send output anywhere you want.  You can send output to a file, you can communicate it, you can send it a C function or a Fortran function, etc.  We have not exercised all of these options yet in Trilinos but because you can define the underlying std::basic_streambuf to send formatted char strings anywhere you want, there is no limit to what can be done with the output.  Does PETSc allow the user to provide a polymorphic object (i.e. a struct with void* and a set of function pointers) to allow the user to send output anywhere?  You would provided a virtual version of printf(...) basically for PETSc object to use.  In standard use case would just be a fall-through call to printf(...).  That might be what is already supported in PETSc but I would have to look and have demonstrated.

The hard part is getting the std::ostream object into the objects that you want and telling them how verbose you want them to be.  For that purpose, Teuchos defines the base class interface Teuchos::VerboseObject.  However, getting access to the object to get at its VerboseObject members is very hard to do and takes a lot of design work in the calling code.  I have spent days doing this in Trilinos packages like Rythmos, MOOCHO, etc.  It is a lot of work even when you control the code.

For a little more info on this approach, see GCG 16 "Always send output to some general std::ostream object; Never send output directly to std::cout or std::cerr; Never print output with print(...) or printf(...)" in:

    http://web.ornl.gov/~8vt/TrilinosCodingDocGuidelines.pdf

You can see the class descriptions for these classes see the "output support" section at:

     http://trilinos.sandia.gov/packages/docs/dev/packages/teuchos/doc/html/index.html#TeuchosCore_src

It would be a huge amount of work to find a way to create useful comprehensible output from a multi-physics code send directly to one stream (i.e. STDOUT).  In the case of CASL VERA Tiamat, the different physics codes actually run in parallel of each other in a block Jacobi black-box solve so even the output from the root rank of each physics would be (and currently is) jumbled together.  I suspect that for complex multi-physics APPs like CASL VERA Tiamat, the best overall approach would be to get each physics module to send all of its output to independent files and then just print a nice short summary/table to STDOUT.  That is, separate files for COBRA-TF, MPACT, Insilico, and MOOSE/Peregrine (and in the future MAMBA) would be used that will be written to on the root process of the cluster for each of these modules.  Even this will be hard to implement because, for example, one would need to set up MOOSE/Peregrine to redirect all of its objects and code that it calls to output to a single std::ostream object which is given to it by Tiamat (which in this case will actually be an std::ofstsream object).  This std::ostream object needs to be created by the driver Tiamat and passed into MOOSE/Peregrine and then MOOSE/Peregrine needs to find a way to make *all* output created on its behalf send to that std::ostream object, including all PETSc objects it creates and calls.  That means that PETSc needs to allow users to provide an arbitrary output object that could in turn send its output to the right std::ostream object.  The same goes for Fortran code creating output (but would be harder to set up obviously).  Also, the outputting for PETSc could not be redirected globally because in the same process we might have MOOSE, COBRA-TF, and MPACT using PETSc objects running.  I know how to do this for the basic Trilinos objects but I don't know how to do that for PETSc object (and even if I knew, I don't know how to get MOOSE to comply, this is  not our code).

Just getting these Fortran codes to redirect their output to a general function that we could override to send output to the underlying std::ostream object for their physics code will be a challenge in itself.  But since we control a big part of the funding all the Fortran codes, that  would be possible.  MOOSE would be a different story because we have no control over MOOSE.  Therefore, MOOSE might be a lost cause and we might just have to turn off all its outputting if we want a clean STDOUT.

For now, because of all of the challenges, I think we are just going give up and let all these codes print incomprehensible output to STDOUT and just create nice independent summary output and send it to a file.  We would write a driver script to hide this that would basically do:

  $ timat [other options] --summary-file=tiamat-summary.log &> incomprehensible.log &
  $ tail -f timat-summary.log
  $ [wait for timat to finish]

This would not be the end of the world but if a user every actually looked at tiamat-summary.log they would wonder what types of clowns would write software that produced such worthless and confusing output (and we would say how naive these silly users are to think that printing output was easy :-)).

I would say that trying to coordinate output (both setting verbosity levels and redirecting sinks for output) between Trilinos and PETSc would be a good place to start.  The use cases that we would need to code up, demonstrate, and then protect with integration testing would be:

1) A C++ driver code sets up std::ostream objects and all Trilinos and PETSc objects that are created redirect output to those objects.

2) A C driver code that calls Trilinos and PETSc (perhaps indirectly) redirects all output through printf(...), sprintf(...), fprintf(...) or some other function defined on the C side.

3) A Fortran code that calls Trilinos and PETSc (perhaps indirectly) redirects all output to some arbitrary Fortran print facilities.

Note that the use cases for #2 and #3 are facilitated through C call-back functions that just pass formatted char strings back to C or Fortran where it will be printed.  I know how do that for the Trilinos approach.

Issues of multi-process printing and communication can be handled in a variety of ways and I am not too worried about that.  The most important thing is arbitrary output redirection.

However, unless someone is willing to set up and support the infrastructure to maintain the above examples with at least the main-line development versions of PETSc and Trilinos, there is no point in doing anything because it will be lost.  That is the first issue to address.  DOE program managers say that they may want  better coordination but are they willing to pay for it?  CASL can get by as described above with the status quo (which is a mess) and my guess is that they would not want to invest a lot of money in Trilinos PETSc compatibility, even just outputting clean-up.  Who is going to actually pay for this which includes the infrastructure to maintain it?  Again, if it is not worth maintaining and providing a reasonable delivery mechanism to users, it is not worth implementing in the first place.  That is where the lifecycle issues including regulated backward compatibility become critical.   If we can't come to some understanding how that works and the broad lifecycle issues, there is little point in trying to push forward on interoperability (because it will not work when some poor user actually pulls versions of Trilinos and PETSc tires to use this stuff).

Cheers,

-Ross

From: Matthew Knepley [mailto:knepley at gmail.com]
Sent: Friday, November 22, 2013 5:38 AM
To: Barry Smith
Cc: Bartlett, Roscoe A.; petsc-trilinos-discussion at lists.mcs.anl.gov
Subject: Re: [Petsc-trilinos-discussion] Scope and requirements

On Thu, Nov 21, 2013 at 7:00 PM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:

   One of the issues Roscoe brings up below is in a complicated solver how does one visualize or display what solvers are being used when and where. This is a great issue and I'll continue that specific thread here.

1)   Even when using a single package with different solvers on different subsets of processes this is already a difficult issue, because of the issue of efficiently coordinating the output from the asynchronously running collections of processes (sending everything to rank 0 and having that manage the  coordination is not great).

2)   With multiple packages if each package just prints to stdout as it runs to display solver configuration you get a useless mess.

   It sounds like Trilinos has an abstraction for output, could someone explain it briefly?  Petsc also has an abstraction for output called a PetscViewer object, the final output can be ASCII text, simple binary output, HDF5, graphics, JSON for web browsers, etc based on the PetscViewer subclass being used.  Currently in PETSc the MPI_Comm for a PetscViewer object has to match the MPI_Comm for PETSc object being viewer, this is troublesome, of course, when you want to coordinate the output from two objects with overlapping but different MPI communicators.  I don't know a good way to generalize this yet in PETSc.

Viewer already has multiple dispatch, in that the output depends on both the type of viewer and the type of input object. I think
the easiest thing to do is ensure that objects viewed have comms which are subcomms of the viewer comm. So far, all the comm
relations in our solvers are hierarchical, so we can draw a tree for the relation. The final output of the tree can be organized by
the Viewer and sent out over some stream (say to your webserver). The JSON is naturally tree-like, so I would like a nice collapsing
browser for this (same for input). We are working on just this for PyLith.

    Matt

  The PETSc abstraction of stdout is a PetscViewer of ASCII type. Other packages can "stream" ASCII to the stdout viewer and it will appear in the same order as inputted. As mentioned above this is limited to a single MPI_Comm to coordinate output.

   I'm interested in models that support coordinating the output from objects that live on more complicated subsets of processes.

   Given that stdout is a crazy thing on big batch machines anyways, I am in favor of an abstraction for it that actually does not use real stdout at all, but perhaps a web server or something else.

   Barry

On Nov 21, 2013, at 11:05 AM, Bartlett, Roscoe A. <bartlettra at ornl.gov<mailto:bartlettra at ornl.gov>> wrote:

> Okay, so I have some concrete use cases from the CASL VERA effort that I am involved with that has issues with PETSc and Trilinos.  We have a situation where we need to couple together multiple Fortran and C++ codes into single executables that use mixes of PETSc and Trilinos and it is a mess and not because of any interfacing issues really.  Here is why ...
>
> The LANL code Hydra-TH is using PETSc and some old version of ML.  Therefore we can't even link Hydra-TH in with code that uses current versions of Trilinos.  We have not even tried to use the up-to-date version of ML under PETSc and to do so would create a nasty circular dependency in out build process the way it is defined now.  More on this issue below.
>
> The INL code MOOSE with APP Peregrine (PNNL) uses PETSc with HYPRE.
>
> We have two parallel Fortran codes using PETSc, MPACT (Univ. of Mich.) and COBRA-TF (Penn. State), that have overlapping sets of processes and they can't currently figure out how to initialize PETSc to work in these cases (they may ask for some help actually).  Also, what about nested PETSc solves from different applications?  What does that output look like if you could even get it to run (which they have not yet)?
>
> The ORNL code  Insilico (part of the Exnihilo repo that includes Denovo) uses up-to-date solvers in Trilinos.
>
> CASL VERA has a top level driver code called Tiamat that couples together COBRA-TF (which uses PETSc), Peregrine/MOOSE (which also uses PETSc/HYPRE), and Insilico (which uses up-to-date Trilinos).  In addition, it runs these codes in different clusters of processes that may or may not overlap (that is a runtime decision on startup).  The output from this coupled code dumped to STDOUT is currently incompressible.
>
> As of right now, it is completely impractical to couple Hydra-TH into Insilico because of their use of PETSc and an old version of ML from Trilinos.  As a matter of fact, it is agreed by all parties that before that happens, Hydra-TH needs to either use only PETSc/HYPRE or Trilinos but no mixing the two at all.  It is recognized that changing Hydra-TH to use PETSc/HYPRE or all Trilinos will be a huge job because it will change all of their tests.  It will be a large amount of work to re-verify all of their Exodus-based gold-standard regression tests for a change like this (and so they have put this off for a year!).
>
> Therefore, before there is any discussion of new fancy interfaces, we have to resolve the following issues first:
>
> 1) The current version of PETSc must be compatible with the current version of Trilinos so they can be linked in a single executable and we must remove link order dependencies.  Also, users need to know ranges of versions of Trilinos and PETSc that are link and runtime compatible.  The better that the mature capabilities in Trilinos and PETSc can maintain backward compatibility over longer ranges of time/versions, the easier this gets.  The ideas for doing this are described in the section "Regulated Backward Compatibility" in http://web.ornl.gov/~8vt/TribitsLifecycleModel_v1.0.pdf .  Also, PETSc should support dependency inversion and dependency injection so that people can add ML support into PETSc without having to directly have ML upstream from PETSc.  We can do this already with Stratimikos in Trilinos so we could add a PETSc solver or preconditioner (or any other preconditioner or solver) in a downstream library.  This is already being used in production code in CASL in  Insilico.  There is a little interface issue here but not much.
>
> 2) The user needs to have complete control of where output goes on an object-by-object based on each process, period.   Otherwise, multiphysics codes (either all PETSc or Trilinos or mixes) create incomprehensible output.  This also applies to nested solves (i.e. how does output form GMRES nested inside of GMRES output looks like?).  We have suggested standards for this in Trilinos that if every code followed would solve this problem (see GCG 18 in http://web.ornl.gov/~8vt/TrilinosCodingDocGuidelines.pdf ).  While this is not an official standard in Trilinos I think it is followed pretty well by more or less by the more modern basic linear solvers and preconditioners.  How do you allow users to customize how PETSc, Trilinos, and their own objects create and intermix output such that it is useful for them?  In a complex multi-physics application, this is very hard.
>
> From the standpoint of CASL, if all these physics codes used the more modern Trilinos preconditioners and solvers, all of the above problems would go away.  But that is just not feasible right now for many of the reasons listed above and below.
>
> NOT: The reason the Fortran codes use PETSC is because Trilinos has no acceptable Fortran interface and you need to be a C++ programmer to write a customized Fortran to C++ interface to Trilinos.  And in our experience, if a programming team knows C++ in addition to Fortran, they would be writing a lot of their coordination code C++ in the first place where they could just directly be using Trilinos from C++ (and therefore no need for a Fortran interface for Trilinos).  The lack of a general portable Fortran interface to basic Trilinos data structures and other facilities makes every Fortran-only team go to PETSc.  That is a no-brainer.  But here-in we have the current status quo and why it is not feasible to switch all of CASL VERA codes over to Trilinos.
>
> Therefore, I would say that before there is any talk of more detailed interfaces or interoperability between Trilinos and PETSc that we first solve the basic problems of version compatibility, dependency injection, and outputting control.  While these problems are much easier than the more challenging interfacing work, they will still require a lot of ongoing efforts.
>
> Cheers,
>
> -Ross
>
>> -----Original Message-----
>> From: Barry Smith [mailto:bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>]
>> Sent: Wednesday, November 20, 2013 11:16 PM
>> To: Jed Brown
>> Cc: Bartlett, Roscoe A.; petsc-trilinos-discussion at lists.mcs.anl.gov<mailto:petsc-trilinos-discussion at lists.mcs.anl.gov>
>> Subject: Re: [Petsc-trilinos-discussion] Scope and requirements
>>
>>
>>   Hmm, the PETSc wrapper for ML is rather clunky and fragile (meaning it is
>> difficult to change safely, that is without breaking something else in the
>> process). This could be for several reasons
>>
>>    1) ML has evolved and doesn't have the clearest documentation
>>    2) there isn't a great match between the abstractions/code organization in
>> ML/Trilinos and PETSc
>>    3) the interface was done "by hand" as needed each time to get a bit more
>> functionality across and hence is a bit ad hoc.
>>
>>   I hate to think of having many of these fragile ad hoc interfaces hanging
>> around. So how could this be done in scalable maintainable way? If we
>> understand the fundamental design principles of the two packages to
>> determine commonalities (and possibly troublesome huge differences) we
>> may be able to see what changes could be made to make it easier to mate
>> plugins from the two sides. So a brief discussion of PETSc object life cycles for
>> the Trilinos folks. If they can produce something similar for Trilinos that would
>> help us see the sticky points.
>>
>>     Most important PETSc objects have the following life cycles (I'll use Mat as
>> the example class, same thing holds for other classes, like nonlinear solves,
>> preconditioners....)
>>
>>     Mat mat = MatCreate(MPI_Comm)
>>     MatSetXXX()                                         // can set come generic properties of the
>> matrix, like size
>>     MatSetType()    // instantiate the actual class like compressed sparse row,
>> or matrix free or ....
>>     MatSetYYY()      // set generic properties of the matrix or ones specific to
>> the actual class instantiated
>>     MatSetFromOptions()   // allow setting options from the command line etc
>> for the matrix
>>     MatSetUp()        //  "setup" the up the matrix so that actual methods may
>> be called on the matrix
>>
>>         In some sense all of the steps above are part of the basic constructor of
>> the object (note at this point we still don't have any entries in the matrix)
>>         Also at this point the "size" and parallel layout of the matrix (for solvers
>> the size of the vectors and matrices it uses) is set in stone and cannot be
>> changed
>>         (without an XXXReset()).
>>
>>      MatSetValues()
>>      MatAssemblyBegin/End()    // phases to put values into matrices with
>> explicit storage of entries
>>
>>         Once this is done one can perform operations with the matrix
>>
>>      MatMult()
>>      etc
>>
>>      MatDestroy() or
>>      MatReset()         // cleans out the object of everything related to the
>> size/parallel layout of the vectors/matrices but leaves the type and options
>> that have been
>>                                    set, this is to allow one to use the same (solver) object
>> again for a different size problem (due to grid refinement or whatever)
>> without
>>                                    needing to recreate the object from scratch.
>>
>>     There are a few other things like serialization but I don't think they matter
>> in this discussion. There is reference counting so you can pass objects into
>> other objects etc and the objects will be kept around automatically until the
>> reference counts go down to zero. If you have a class that provides these
>> various stages then it is not terribly difficult to wrap them up to look like
>> PETSc objects (what Jed called a plugin). In fact for ML we have a
>> PCCreate_ML() PCSetUp_ML() PCDestroy_ML() etc.
>>
>>      So one way to program with PETSc is to write code that manages the life
>> cycles of whatever objects are needed. For example you create a linear
>> solver object, a matrix object, some vector objects, set the right options, fill
>> them up appropriately, call the solver and then cleanup. Same with nonlinear
>> solvers, ODE integrators, eigensolvers.
>>
>>      For "straightforward" applications this model can often be fine. When one
>> wants to do more complicated problems or use algorithms that require more
>> information about the problem such as geometric multigrid and "block"
>> solvers (here I mean block solvers like for Stokes equation and "multi
>> physics" problems not block Jacobi or multiple right hand side "block solvers")
>> requiring the user to mange the life cycles of all the vectors, matrices,
>> auxiliary vector and matrices (creating them, giving them appropriate
>> dimensions, hooking them together, filling them with appropriate values,
>> and finally destroying them) is asking too much of the users.  PETSc has the
>> DM object, which can be thought of as a "factory" for the correctly sized sub
>> vectors and matrices needed for the particular problem being solved. The
>> DM is given to the solver then the solver queries the DM to create whatever
>> of those objects it needs in the solution process. For example with geometric
>> multigrid the PCMG asks the DM for each coarser grid matrix. For a Stokes
>> problem PCFIELDSPLIT can ask for (0,0) part of the operator, or the (1,1) etc
>> and build up the solver from the objects provided by the DM.  Thus a typical
>> PETSc program creates an appropriate solver object (linear, nonlinear, ODE,
>> eigen), creates a DM for the problem, passes the DM to the solver and
>> during the solver set up it obtains all the information it needs from the DM
>> and solves the problem. Obviously the DM needs information about the
>> mesh and PDE being solved.
>>
>>   I am not writing this to advocate making Trilinos follow the same model
>> (though I think you should :-)) but instead to develop a common
>> understanding of what is common in our approaches (and can be leveraged)
>> and what is fundamentally different (and could be troublesome). For
>> example, the fact that you have a different approach to creating Trilinos
>> objects (using C++ factories) is superficially very different from what PETSc
>> does, but that may not really matter.
>>
>>   Barry
>>
>>
>>
>> On Nov 20, 2013, at 5:16 PM, Jed Brown <jedbrown at mcs.anl.gov<mailto:jedbrown at mcs.anl.gov>> wrote:
>>
>>> Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> writes:
>>>>  One "suggestion" is, could there be a "higher-level" interface that
>>>>  people could use that incorporated either PETSc or Trilinos
>>>>  underneath? A difficulty I see with that approach is that design
>>>>  decisions (either in C or C++) at one level of the stack permeate
>>>>  the entire stack and users have to see more then just the top level
>>>>  of the stack from our libraries. For example,
>>>
>>> I think that if the program managers want better integration and an
>>> easier transition between packages, that instead of creating a new
>>> high-level interface, we should build out our cross-package plugin
>>> support.  For example, ML is a popular solver among PETSc users, but we
>>> could add support for more Trilinos preconditioners and solvers, and
>>> even support assembling directly into a [TE]petra matrix (PETSc matrices
>>> are more dynamic, but I still think this can be done reasonably).  We
>>> probably need to talk some about representing block systems, but I don't
>>> think the issues are insurmountable.
>>>
>>> Then applications can freely choose whichever package they find more
>>> convenient (based on experience, types of solvers, implementation
>>> language, etc) with confidence that they will be able to access any
>>> unique features of the other package.  When coupling multiple
>>> applications using different solver packages, the coupler should be able
>>> to choose either package to define the outer solve, with the same code
>>> assembling either a monolithic matrix or a split/nested matrix with
>>> native preconditioners within blocks.
>>>
>>> As I see it, there is a fair amount of duplicate effort by packages
>>> (such as libMesh, Deal.II, FEniCS) that ostensibly support both Trilinos
>>> and PETSc, but were not written by "solvers people" and are managing the
>>> imperfect correspondence themselves.  The unified interfaces that these
>>> projects have built are generally less capable than had they committed
>>> entirely to either one of our interfaces.
>>>
>>>
>>> It will take some effort to implement this interoperability and it's
>>> hard to sneak in under "basic research" like a lot of other software
>>> maintenance tasks, but I think that providing it within our own plugin
>>> systems is less work and can be supported better than any alternative.
>

_______________________________________________
Petsc-trilinos-discussion mailing list
Petsc-trilinos-discussion at lists.mcs.anl.gov<mailto:Petsc-trilinos-discussion at lists.mcs.anl.gov>
https://lists.mcs.anl.gov/mailman/listinfo/petsc-trilinos-discussion

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-trilinos-discussion/attachments/20131122/fa95c2ba/attachment-0001.html>