since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++
christopher.e.kees at usace.army.mil
Mon Dec 7 15:05:15 CST 2009
This whole thread has been really interesting. I'm interested in
Barry's suggestion, and I know a handful of people not on the list who
would also be interested. I've got a couple of questions/responses
to the MatSetValues, point-wise physics, and "support tools" issues
that were raised.
Did you guys settle the issue that Jed brought up about about the
Python Global Interpreter Lock (GIL)? There was some discussion of it
at SciPy this summer, but I didn't come away feeling like anybody had
an immediate solution or that Guido was in favor of eliminating the
GIL anytime soon. Are you guys assuming that the "worker threads"
for the multiple cores/GPU's are going to be set up by C code (hand-
It seems to me that the MatSetValues discussion moved on from the
threading issue (since you can't call MatSetValues efficiently in
python anyway) to the more general issue of how PETSc is going to
explicitly support shared memory parallelism. I interpret the last
interchange as implying that the solution is another level of domain
decomposition that would be hidden from the mat assembly loop level
(i.e. users aren't going to have to write their code like the "thread
part" is their parallel subdomain). Is that a correct interpretation?
Something along the lines of additional args to MatCreate that
indicates how many threads to use per subdomain and some OpenMPI style
instructions to parallelize the assembly loop?
On the issue of the point-wise physics (e.g. quadrature point
routines, Riemann solvers, ...). It seems like this is a long-standing
bottleneck for object-orient approaches to PDE's going back to the OO
Fem research in the 80s and 90s. The SciPy group that met to talk
about PDE's spent some time on it, partly related to femhub project.
I agree that a code generation approach seems to be the only way
forward. Ondrej Certik from the femhub project has written a symbolic
math module for python, and we talked some about using this to
represent the physics in a way that can be used for code generation.
I'm not sure what progress he's made. At the end of the day you've got
to allow people to write code for physics without knowledge of the
assembly processes while still injecting that code into the inner
loop, otherwise you'll approach the speed of monolithic fortran codes.
One issue that you might also want to consider with multi-language
approaches is the configuration and build tools. I agree that he
debugging support is a major issue, but what has taken far more of my
time is the lack of cross-platform configuration and build tools for
multi-language software. We have looked at CMake and Scons but have
had to stick with a pretty ugly mixture of python scripts of python
and make. I feel like the other scientific computing packages like
Sage and EPD are successfully growing because they distribute binaries
that include all dependencies, but I don't see that being feasible for
HPC software anytime soon. It seems like to really be successful you
may have to rely on a full featured and fully documented tool like
Scons is aiming at being, otherwise you'll end up with an approach
that works for you (because you wrote it) but not very well for others
trying to use the new PETSc.
We have a small group at that US Army ERDC that has been developing a
python FEM framework for multiscale/multiphysics modeling that tries
to break coupling between the physics and the FEM methods (i.e. we try
to support CG,DG, and non-conforming elements with variable order and
element topologies running from the same description of the physics).
The code uses our own wrappers to petsc and mpi, but only because we
started before mpi4py and petsc4py existed. We have a memory
intensive physics API based on computing and storing quadrature point
values of PDE coefficients and numerical fluxes and a residual and
jacobian assembly processes along the lines of Matt's first approach
of storing element residuals and jacobians. We also have handwritten
"optimized" codes that collapse all the loops and eliminate storage
for direct residual and jacobian assembly. We did that last step out
of necessity and as a precursor to working on a code generation
approach (it's what we want the code generator to write). It also
has the side effect of providing a low level API for people who want
to write traditional monolithic fortran codes for residual and
Our approach for the "user" or more correctly the "physics developer"
is 1) prototype physics in python 2) "vectorize" the physics by
rewriting in C/C++/Fortran using swig, f2py, or hand written wrappers
and 3) take the inner loop of 2 and build monolithic residual and
jacobian routines. This is obviously not satisfactory and not
maintainable in the long-run but we are solving "real" nonlinear multi-
physics problems in 3D with it (e.g. incompressible two-phase flow,
two-phase flow in porous media, shallow water equations, and a few
other models). Recent profiles have put about 95% of the compute time
inside KSPSolve, but we've got a lot of work to do on better
preconditioning and tolerance selection before that number provides
reliable guidance. Anyway, If you guys go forward with Barry's
suggestion I'd be interested in seeing if there's some way to work out
some inter-agency collaboration in addition to the good academic
collaboration that you already have.
On Dec 6, 2009, at 9:43 AM, Jed Brown wrote:
> On Sat, 5 Dec 2009 14:23:45 -0600, Matthew Knepley
> <knepley at gmail.com> wrote:
>> We do impose some of our process on users already, such as CHKERRQ
> CHKERRQ is still optional, it's just recommended in order to get a
> decent trace. It can't be used, for example, for top-level calls
> my VisIt plugin because those functions necessarily return something
> else (I have an alternative which checks the return code, generates
> debugging messages, possibly tries to deallocate local resources, and
> throws the appropriate exception). In C99 projects, I use an
> macro that uses __func__ instead of __FUNCT__.
> There is PetscMallocSet if the user doesn't want to use PetscMalloc,
> users are rarely required to directly interact with
> PetscMalloc/PetscFree anyway since most arrays are copied or managed
> an object.
More information about the petsc-dev