[Petsc-trilinos-discussion] Scope and requirements

Thu Nov 21 11:05:17 CST 2013

Okay, so I have some concrete use cases from the CASL VERA effort that I am involved with that has issues with PETSc and Trilinos.  We have a situation where we need to couple together multiple Fortran and C++ codes into single executables that use mixes of PETSc and Trilinos and it is a mess and not because of any interfacing issues really.  Here is why ...

The LANL code Hydra-TH is using PETSc and some old version of ML.  Therefore we can't even link Hydra-TH in with code that uses current versions of Trilinos.  We have not even tried to use the up-to-date version of ML under PETSc and to do so would create a nasty circular dependency in out build process the way it is defined now.  More on this issue below.

The INL code MOOSE with APP Peregrine (PNNL) uses PETSc with HYPRE.

We have two parallel Fortran codes using PETSc, MPACT (Univ. of Mich.) and COBRA-TF (Penn. State), that have overlapping sets of processes and they can't currently figure out how to initialize PETSc to work in these cases (they may ask for some help actually).  Also, what about nested PETSc solves from different applications?  What does that output look like if you could even get it to run (which they have not yet)?

The ORNL code  Insilico (part of the Exnihilo repo that includes Denovo) uses up-to-date solvers in Trilinos.

CASL VERA has a top level driver code called Tiamat that couples together COBRA-TF (which uses PETSc), Peregrine/MOOSE (which also uses PETSc/HYPRE), and Insilico (which uses up-to-date Trilinos).  In addition, it runs these codes in different clusters of processes that may or may not overlap (that is a runtime decision on startup).  The output from this coupled code dumped to STDOUT is currently incompressible.

As of right now, it is completely impractical to couple Hydra-TH into Insilico because of their use of PETSc and an old version of ML from Trilinos.  As a matter of fact, it is agreed by all parties that before that happens, Hydra-TH needs to either use only PETSc/HYPRE or Trilinos but no mixing the two at all.  It is recognized that changing Hydra-TH to use PETSc/HYPRE or all Trilinos will be a huge job because it will change all of their tests.  It will be a large amount of work to re-verify all of their Exodus-based gold-standard regression tests for a change like this (and so they have put this off for a year!).

Therefore, before there is any discussion of new fancy interfaces, we have to resolve the following issues first:

1) The current version of PETSc must be compatible with the current version of Trilinos so they can be linked in a single executable and we must remove link order dependencies.  Also, users need to know ranges of versions of Trilinos and PETSc that are link and runtime compatible.  The better that the mature capabilities in Trilinos and PETSc can maintain backward compatibility over longer ranges of time/versions, the easier this gets.  The ideas for doing this are described in the section "Regulated Backward Compatibility" in http://web.ornl.gov/~8vt/TribitsLifecycleModel_v1.0.pdf .  Also, PETSc should support dependency inversion and dependency injection so that people can add ML support into PETSc without having to directly have ML upstream from PETSc.  We can do this already with Stratimikos in Trilinos so we could add a PETSc solver or preconditioner (or any other preconditioner or solver) in a downstream library.  This is already being used in production code in CASL in  Insilico.  There is a little interface issue here but not much.

2) The user needs to have complete control of where output goes on an object-by-object based on each process, period.   Otherwise, multiphysics codes (either all PETSc or Trilinos or mixes) create incomprehensible output.  This also applies to nested solves (i.e. how does output form GMRES nested inside of GMRES output looks like?).  We have suggested standards for this in Trilinos that if every code followed would solve this problem (see GCG 18 in http://web.ornl.gov/~8vt/TrilinosCodingDocGuidelines.pdf ).  While this is not an official standard in Trilinos I think it is followed pretty well by more or less by the more modern basic linear solvers and preconditioners.  How do you allow users to customize how PETSc, Trilinos, and their own objects create and intermix output such that it is useful for them?  In a complex multi-physics application, this is very hard.

>From the standpoint of CASL, if all these physics codes used the more modern Trilinos preconditioners and solvers, all of the above problems would go away.  But that is just not feasible right now for many of the reasons listed above and below.

NOT: The reason the Fortran codes use PETSC is because Trilinos has no acceptable Fortran interface and you need to be a C++ programmer to write a customized Fortran to C++ interface to Trilinos.  And in our experience, if a programming team knows C++ in addition to Fortran, they would be writing a lot of their coordination code C++ in the first place where they could just directly be using Trilinos from C++ (and therefore no need for a Fortran interface for Trilinos).  The lack of a general portable Fortran interface to basic Trilinos data structures and other facilities makes every Fortran-only team go to PETSc.  That is a no-brainer.  But here-in we have the current status quo and why it is not feasible to switch all of CASL VERA codes over to Trilinos.

Therefore, I would say that before there is any talk of more detailed interfaces or interoperability between Trilinos and PETSc that we first solve the basic problems of version compatibility, dependency injection, and outputting control.  While these problems are much easier than the more challenging interfacing work, they will still require a lot of ongoing efforts.

Cheers,

-Ross

> -----Original Message-----
> From: Barry Smith [mailto:bsmith at mcs.anl.gov]
> Sent: Wednesday, November 20, 2013 11:16 PM
> To: Jed Brown
> Cc: Bartlett, Roscoe A.; petsc-trilinos-discussion at lists.mcs.anl.gov
> Subject: Re: [Petsc-trilinos-discussion] Scope and requirements
>
>
>    Hmm, the PETSc wrapper for ML is rather clunky and fragile (meaning it is
> difficult to change safely, that is without breaking something else in the
> process). This could be for several reasons
>
>     1) ML has evolved and doesn't have the clearest documentation
>     2) there isn't a great match between the abstractions/code organization in
> ML/Trilinos and PETSc
>     3) the interface was done "by hand" as needed each time to get a bit more
> functionality across and hence is a bit ad hoc.
>
>    I hate to think of having many of these fragile ad hoc interfaces hanging
> around. So how could this be done in scalable maintainable way? If we
> understand the fundamental design principles of the two packages to
> determine commonalities (and possibly troublesome huge differences) we
> may be able to see what changes could be made to make it easier to mate
> plugins from the two sides. So a brief discussion of PETSc object life cycles for
> the Trilinos folks. If they can produce something similar for Trilinos that would
> help us see the sticky points.
>
>      Most important PETSc objects have the following life cycles (I'll use Mat as
> the example class, same thing holds for other classes, like nonlinear solves,
> preconditioners....)
>
>      Mat mat = MatCreate(MPI_Comm)
>      MatSetXXX()                                         // can set come generic properties of the
> matrix, like size
>      MatSetType()    // instantiate the actual class like compressed sparse row,
> or matrix free or ....
>      MatSetYYY()      // set generic properties of the matrix or ones specific to
> the actual class instantiated
>      MatSetFromOptions()   // allow setting options from the command line etc
> for the matrix
>      MatSetUp()        //  "setup" the up the matrix so that actual methods may
> be called on the matrix
>
>          In some sense all of the steps above are part of the basic constructor of
> the object (note at this point we still don't have any entries in the matrix)
>          Also at this point the "size" and parallel layout of the matrix (for solvers
> the size of the vectors and matrices it uses) is set in stone and cannot be
> changed
>          (without an XXXReset()).
>
>       MatSetValues()
>       MatAssemblyBegin/End()    // phases to put values into matrices with
> explicit storage of entries
>
>          Once this is done one can perform operations with the matrix
>
>       MatMult()
>       etc
>
>       MatDestroy() or
>       MatReset()         // cleans out the object of everything related to the
> size/parallel layout of the vectors/matrices but leaves the type and options
> that have been
>                                     set, this is to allow one to use the same (solver) object
> again for a different size problem (due to grid refinement or whatever)
> without
>                                     needing to recreate the object from scratch.
>
>      There are a few other things like serialization but I don't think they matter
> in this discussion. There is reference counting so you can pass objects into
> other objects etc and the objects will be kept around automatically until the
> reference counts go down to zero. If you have a class that provides these
> various stages then it is not terribly difficult to wrap them up to look like
> PETSc objects (what Jed called a plugin). In fact for ML we have a
> PCCreate_ML() PCSetUp_ML() PCDestroy_ML() etc.
>
>       So one way to program with PETSc is to write code that manages the life
> cycles of whatever objects are needed. For example you create a linear
> solver object, a matrix object, some vector objects, set the right options, fill
> them up appropriately, call the solver and then cleanup. Same with nonlinear
> solvers, ODE integrators, eigensolvers.
>
>       For "straightforward" applications this model can often be fine. When one
> wants to do more complicated problems or use algorithms that require more
> information about the problem such as geometric multigrid and "block"
> solvers (here I mean block solvers like for Stokes equation and "multi
> physics" problems not block Jacobi or multiple right hand side "block solvers")
> requiring the user to mange the life cycles of all the vectors, matrices,
> auxiliary vector and matrices (creating them, giving them appropriate
> dimensions, hooking them together, filling them with appropriate values,
> and finally destroying them) is asking too much of the users.  PETSc has the
> DM object, which can be thought of as a "factory" for the correctly sized sub
> vectors and matrices needed for the particular problem being solved. The
> DM is given to the solver then the solver queries the DM to create whatever
> of those objects it needs in the solution process. For example with geometric
> multigrid the PCMG asks the DM for each coarser grid matrix. For a Stokes
> problem PCFIELDSPLIT can ask for (0,0) part of the operator, or the (1,1) etc
> and build up the solver from the objects provided by the DM.  Thus a typical
> PETSc program creates an appropriate solver object (linear, nonlinear, ODE,
> eigen), creates a DM for the problem, passes the DM to the solver and
> during the solver set up it obtains all the information it needs from the DM
> and solves the problem. Obviously the DM needs information about the
> mesh and PDE being solved.
>
>    I am not writing this to advocate making Trilinos follow the same model
> (though I think you should :-)) but instead to develop a common
> understanding of what is common in our approaches (and can be leveraged)
> and what is fundamentally different (and could be troublesome). For
> example, the fact that you have a different approach to creating Trilinos
> objects (using C++ factories) is superficially very different from what PETSc
> does, but that may not really matter.
>
>    Barry
>
>
>
> On Nov 20, 2013, at 5:16 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
> > Barry Smith <bsmith at mcs.anl.gov> writes:
> >>   One "suggestion" is, could there be a "higher-level" interface that
> >>   people could use that incorporated either PETSc or Trilinos
> >>   underneath? A difficulty I see with that approach is that design
> >>   decisions (either in C or C++) at one level of the stack permeate
> >>   the entire stack and users have to see more then just the top level
> >>   of the stack from our libraries. For example,
> >
> > I think that if the program managers want better integration and an
> > easier transition between packages, that instead of creating a new
> > high-level interface, we should build out our cross-package plugin
> > support.  For example, ML is a popular solver among PETSc users, but we
> > could add support for more Trilinos preconditioners and solvers, and
> > even support assembling directly into a [TE]petra matrix (PETSc matrices
> > are more dynamic, but I still think this can be done reasonably).  We
> > probably need to talk some about representing block systems, but I don't
> > think the issues are insurmountable.
> >
> > Then applications can freely choose whichever package they find more
> > convenient (based on experience, types of solvers, implementation
> > language, etc) with confidence that they will be able to access any
> > unique features of the other package.  When coupling multiple
> > applications using different solver packages, the coupler should be able
> > to choose either package to define the outer solve, with the same code
> > assembling either a monolithic matrix or a split/nested matrix with
> > native preconditioners within blocks.
> >
> > As I see it, there is a fair amount of duplicate effort by packages
> > (such as libMesh, Deal.II, FEniCS) that ostensibly support both Trilinos
> > and PETSc, but were not written by "solvers people" and are managing the
> > imperfect correspondence themselves.  The unified interfaces that these
> > projects have built are generally less capable than had they committed
> > entirely to either one of our interfaces.
> >
> >
> > It will take some effort to implement this interoperability and it's
> > hard to sneak in under "basic research" like a lot of other software
> > maintenance tasks, but I think that providing it within our own plugin
> > systems is less work and can be supported better than any alternative.