since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

Sat Dec 5 14:03:32 CST 2009

As someone who has a finite-element code built upon PETSc/Sieve with the 
top-level code in Python, I am in favor of Barry's approach.

As Matt mentions debugging multi-languages is more complex. Unit testing 
helps solve some of this because tests associated with the low-level 
code involve only one language and find most of the bugs.

We started with manual C++/Python interfaces, then moved to Pyrex, and 
now use SWIG. Because we use C++, the OO support in SWIG results in a 
much better simpler, cleaner interface between Python and C++ than what 
is possible with Pyrex or Cython. SWIG has eliminated 95% of the effort 
to interface Python and C++ compared to Pyrex.

Brad

Matthew Knepley wrote:
> On Fri, Dec 4, 2009 at 10:42 PM, Barry Smith <bsmith at mcs.anl.gov 
> <mailto:bsmith at mcs.anl.gov>> wrote:
> 
> 
>       Suggestion:
> 
>     1) Discard PETSc
>     2) Develop a general Py{CL, CUDA, OpenMP-C} system that dispatches
>     "tasks" onto GPUs and multi-core systems (generally we would have
>     one python process per compute node and local parallelism would be
>     done via the low-level kernels to the cores and/or GPUs.)
>     3) Write a new PETSc using MPI4py and 2) written purely in Python
>     3000 using all its cool class etc features
>     4) Use other packages (like f2py) to generate bindings so that 3)
>     maybe called from Fortran90/2003 and C++ (these probably suck since
>     people usually think the other way around; calling Fortran/C++ from
>     Python, but maybe we shouldn't care; we and our friends can just be
>     10 times more efficient developing apps in Python).
> 
>       enjoy coding much better than today.
> 
>      What is wrong with Python 3000 that would make this approach not be
>     great?
> 
> 
> I am very a big fan of this approach. Let me restate it:
> 
>   a) Write the initial code in Python for correctness checking, however 
> develop a performance model which will allow transition to an accelerator
> 
>   b) Move key pieces to a faster platform using
> 
>       i) Cython
> 
>       ii) PyCUDA
> 
>   c) Coordinate loose collection of processes with MPI for large problems
> 
> A few comments. Notice that for many people c) is unnecessary if you can 
> coordinate several GPUs from one CPU. The
> key piece here is a dispatch system. Felipe, Rio, and I are getting this 
> done now. Second, we can leverage all of petc4py
> in step b.
> 
> In my past attempts at this development model, they have always 
> floundered on inner loops or iterations. These cannot be
> done in Python (too slow) and cannot be wrapped (too much overhead). 
> However, now we have a way to do this, namely
> RunTime Code Generation (like PyCUDA). I think this will get us over the 
> hump, but we have to rethink how we code things,
> especially traversals, which now become lists of scheduled tasks as in 
> FLASH (from van de Geijn).
> 
>   Matt
>  
> 
> 
>       Barry
> 
>     When coding a new numerical algorithm for PETSc we would just code
>     in Python, then when tested and happy with reimplement in Py{{CL,
>     CUDA, OpenMP-C}
> 
>     The other choice is designing and implementing our own cool/great OO
>     language with the flexibilty and power we want, but I fear that is
>     way to hard  and why not instead leverage Python.
> 
> 
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener