since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

Matthew Knepley knepley at
Mon Dec 7 15:17:34 CST 2009

On Mon, Dec 7, 2009 at 3:05 PM, Chris Kees <
christopher.e.kees at> wrote:

> This whole thread has been really interesting. I'm interested in  Barry's
> suggestion, and I know a handful of people not on the list who would also be
> interested.   I've got a couple of questions/responses to the MatSetValues,
> point-wise physics, and "support tools" issues that were raised.
> Did you guys settle the issue that Jed brought up about about the Python
> Global Interpreter Lock (GIL)?  There was some discussion of it at SciPy
> this summer, but I didn't come away feeling like anybody had an immediate
> solution or that Guido was in favor of eliminating the GIL anytime soon.
>  Are you guys assuming that the "worker threads"  for the multiple
> cores/GPU's are going to be set up by C code (hand- or machine-generated)?

I think this is going to be solved be first, rearranging to the code to
segregate reductions or conflicts 2) using atomics and then 3) intelligent
reordering to minimize the
conflicts. I have never seen this be a big problem, but some code is always
going to have the bad tradeoff I guess.

> It seems to me that the MatSetValues discussion moved on from the threading
> issue (since you can't call MatSetValues efficiently in python anyway) to
> the more general issue of how PETSc is going to explicitly support shared
> memory parallelism. I interpret the last interchange as implying that the
> solution is another level of domain decomposition that would be hidden from
> the mat assembly loop level (i.e. users aren't going to have to write their
> code like the "thread part" is their parallel subdomain). Is that a correct
> interpretation? Something along the lines of additional args to MatCreate
> that indicates how many threads to use per subdomain and some OpenMPI style
> instructions to parallelize the assembly loop?

I am not advocating that. I want a CUDA-like solution, but of course you can
probably convert any one to the other with a
tremendous amount of pain. I wish for all matrix-free applications and
vector assemblies, because then I think the problem
is very simple. I have not seen any compelling argument against this
approach, and all the best codes I know use it.

> On the issue of the point-wise physics (e.g. quadrature point routines,
> Riemann solvers, ...). It seems like this is a long-standing bottleneck for
> object-orient approaches to PDE's going back to the OO Fem research in the
> 80s and 90s.  The SciPy group that met to talk about PDE's spent some time
> on it, partly related to femhub project.  I agree that a code generation
> approach seems to be the only way forward.  Ondrej Certik from the femhub
> project has written a symbolic math module for python, and we talked some
> about using this to represent the physics in a way that can be used for code
> generation.  I'm not sure what progress he's made. At the end of the day
> you've got to allow people to write code for physics without knowledge of
> the assembly processes while still injecting that code into the inner loop,
> otherwise you'll approach the speed of monolithic fortran codes.

This solution has been "obvious" for some time, but needs a solution even a
Fortran programmer could love. Or at least a bad C programmer.

> One issue that you might also want to consider with multi-language
> approaches is the configuration and build tools.   I agree that he debugging
> support is a major issue, but what has taken far more of my time is the lack
> of cross-platform configuration and build tools for multi-language software.
>  We have looked at CMake and Scons but have had to stick with a pretty ugly
> mixture of python scripts of python and make. I feel like the other
> scientific computing packages like Sage and EPD are successfully growing
> because they distribute binaries that include all dependencies, but I don't
> see that being feasible for HPC software anytime soon.  It seems like to
> really be successful you may have to rely on a full featured and fully
> documented tool like Scons is aiming at being, otherwise you'll end up with
> an approach that works for you (because you wrote it) but not very well for
> others trying to use the new PETSc.

I think we do a better job than anyone with PETSc right now, but we will
have to work a little harder to get everything right.

> We have a small group at that US Army ERDC that has  been developing a
> python FEM framework for multiscale/multiphysics modeling that tries to
> break coupling between the physics and the FEM methods (i.e. we try to
> support CG,DG, and non-conforming elements with variable order and element
> topologies running from the same description of the physics). The code uses
> our own wrappers to petsc and mpi, but only because we started before mpi4py
> and petsc4py existed.  We have a memory intensive  physics API based on
> computing and storing quadrature point values of PDE coefficients and
> numerical fluxes and a residual and jacobian assembly processes along the
> lines of Matt's first approach of storing element residuals and jacobians.
>  We also have handwritten "optimized" codes that collapse all the loops and
> eliminate storage for direct residual and jacobian assembly. We did that
> last step out of necessity and as a precursor to working  on a code
> generation approach (it's what we want the code generator to write).   It
> also has the side effect of providing a low level API for people who want to
> write traditional monolithic fortran codes for residual and jacobian
> assembly.

You should also look at Lisandro's fluids code.

> Our approach for the "user"  or more correctly the "physics developer" is
> 1) prototype physics in python 2) "vectorize" the physics by rewriting in
> C/C++/Fortran using swig, f2py, or hand written wrappers and 3) take the
> inner loop of 2 and build monolithic residual and jacobian routines.  This
> is obviously not satisfactory and not maintainable in the long-run but we
> are solving "real" nonlinear multi-physics problems in 3D with it (e.g.
> incompressible two-phase flow, two-phase flow in porous media, shallow water
> equations, and a few other models). Recent profiles have put about 95% of
> the compute time inside KSPSolve, but we've got a lot of work to do on
> better preconditioning and tolerance selection before that number provides
> reliable guidance.  Anyway, If you guys go forward with Barry's suggestion
> I'd be interested in seeing if there's some way to work out some
> inter-agency collaboration in addition to the good academic collaboration
> that you already have.

I would say an improvement to this would be CUDA kernels for your inner


> Chris
> On Dec 6, 2009, at 9:43 AM, Jed Brown wrote:
>  On Sat, 5 Dec 2009 14:23:45 -0600, Matthew Knepley <knepley at>
>> wrote:
>>> We do impose some of our process on users already, such as CHKERRQ and
>>> PetscMalloc.
>> CHKERRQ is still optional, it's just recommended in order to get a
>> decent trace.  It can't be used, for example, for top-level calls within
>> my VisIt plugin because those functions necessarily return something
>> else (I have an alternative which checks the return code, generates
>> debugging messages, possibly tries to deallocate local resources, and
>> throws the appropriate exception).  In C99 projects, I use an alternate
>> macro that uses __func__ instead of __FUNCT__.
>> There is PetscMallocSet if the user doesn't want to use PetscMalloc, but
>> users are rarely required to directly interact with
>> PetscMalloc/PetscFree anyway since most arrays are copied or managed by
>> an object.
>> Jed

What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the petsc-dev mailing list