since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

Mon Dec 7 15:05:15 CST 2009

This whole thread has been really interesting. I'm interested in   
Barry's suggestion, and I know a handful of people not on the list who  
would also be interested.   I've got a couple of questions/responses  
to the MatSetValues, point-wise physics, and "support tools" issues  
that were raised.

Did you guys settle the issue that Jed brought up about about the  
Python Global Interpreter Lock (GIL)?  There was some discussion of it  
at SciPy this summer, but I didn't come away feeling like anybody had  
an immediate solution or that Guido was in favor of eliminating the  
GIL anytime soon.  Are you guys assuming that the "worker threads"   
for the multiple cores/GPU's are going to be set up by C code (hand-  
or machine-generated)?

It seems to me that the MatSetValues discussion moved on from the  
threading issue (since you can't call MatSetValues efficiently in  
python anyway) to the more general issue of how PETSc is going to  
explicitly support shared memory parallelism. I interpret the last  
interchange as implying that the solution is another level of domain  
decomposition that would be hidden from the mat assembly loop level  
(i.e. users aren't going to have to write their code like the "thread  
part" is their parallel subdomain). Is that a correct interpretation?  
Something along the lines of additional args to MatCreate that  
indicates how many threads to use per subdomain and some OpenMPI style  
instructions to parallelize the assembly loop?

On the issue of the point-wise physics (e.g. quadrature point  
routines, Riemann solvers, ...). It seems like this is a long-standing  
bottleneck for object-orient approaches to PDE's going back to the OO  
Fem research in the 80s and 90s.  The SciPy group that met to talk  
about PDE's spent some time on it, partly related to femhub project.   
I agree that a code generation approach seems to be the only way  
forward.  Ondrej Certik from the femhub project has written a symbolic  
math module for python, and we talked some about using this to  
represent the physics in a way that can be used for code generation.   
I'm not sure what progress he's made. At the end of the day you've got  
to allow people to write code for physics without knowledge of the  
assembly processes while still injecting that code into the inner  
loop, otherwise you'll approach the speed of monolithic fortran codes.

One issue that you might also want to consider with multi-language  
approaches is the configuration and build tools.   I agree that he  
debugging support is a major issue, but what has taken far more of my  
time is the lack of cross-platform configuration and build tools for  
multi-language software.  We have looked at CMake and Scons but have  
had to stick with a pretty ugly mixture of python scripts of python  
and make. I feel like the other scientific computing packages like  
Sage and EPD are successfully growing because they distribute binaries  
that include all dependencies, but I don't see that being feasible for  
HPC software anytime soon.  It seems like to really be successful you  
may have to rely on a full featured and fully documented tool like  
Scons is aiming at being, otherwise you'll end up with an approach  
that works for you (because you wrote it) but not very well for others  
trying to use the new PETSc.

We have a small group at that US Army ERDC that has  been developing a  
python FEM framework for multiscale/multiphysics modeling that tries  
to break coupling between the physics and the FEM methods (i.e. we try  
to support CG,DG, and non-conforming elements with variable order and  
element topologies running from the same description of the physics).  
The code uses our own wrappers to petsc and mpi, but only because we  
started before mpi4py and petsc4py existed.  We have a memory  
intensive  physics API based on computing and storing quadrature point  
values of PDE coefficients and numerical fluxes and a residual and  
jacobian assembly processes along the lines of Matt's first approach  
of storing element residuals and jacobians.  We also have handwritten  
"optimized" codes that collapse all the loops and eliminate storage  
for direct residual and jacobian assembly. We did that last step out  
of necessity and as a precursor to working  on a code generation  
approach (it's what we want the code generator to write).   It also  
has the side effect of providing a low level API for people who want  
to write traditional monolithic fortran codes for residual and  
jacobian assembly.

Our approach for the "user"  or more correctly the "physics developer"  
is 1) prototype physics in python 2) "vectorize" the physics by  
rewriting in C/C++/Fortran using swig, f2py, or hand written wrappers  
and 3) take the inner loop of 2 and build monolithic residual and  
jacobian routines.  This is obviously not satisfactory and not  
maintainable in the long-run but we are solving "real" nonlinear multi- 
physics problems in 3D with it (e.g. incompressible two-phase flow,  
two-phase flow in porous media, shallow water equations, and a few  
other models). Recent profiles have put about 95% of the compute time  
inside KSPSolve, but we've got a lot of work to do on better  
preconditioning and tolerance selection before that number provides  
reliable guidance.  Anyway, If you guys go forward with Barry's  
suggestion I'd be interested in seeing if there's some way to work out  
some inter-agency collaboration in addition to the good academic  
collaboration that you already have.

Chris

On Dec 6, 2009, at 9:43 AM, Jed Brown wrote:

> On Sat, 5 Dec 2009 14:23:45 -0600, Matthew Knepley  
> <knepley at gmail.com> wrote:
>> We do impose some of our process on users already, such as CHKERRQ  
>> and
>> PetscMalloc.
>
> CHKERRQ is still optional, it's just recommended in order to get a
> decent trace.  It can't be used, for example, for top-level calls  
> within
> my VisIt plugin because those functions necessarily return something
> else (I have an alternative which checks the return code, generates
> debugging messages, possibly tries to deallocate local resources, and
> throws the appropriate exception).  In C99 projects, I use an  
> alternate
> macro that uses __func__ instead of __FUNCT__.
>
> There is PetscMallocSet if the user doesn't want to use PetscMalloc,  
> but
> users are rarely required to directly interact with
> PetscMalloc/PetscFree anyway since most arrays are copied or managed  
> by
> an object.
>
> Jed