[petsc-dev] PETSc programming model for multi-core systems

Thu Nov 11 21:08:38 CST 2010

On Nov 11, 2010, at 8:24 PM, Mark F. Adams wrote:

> This is a great technical discussion the very vexing question of future programming models.
> 
> In addition to these issues there are facts-on-the-ground.  My limited view of this elephant, if you will, is that OpenMP seems to be getting a certain critical mass, for better or worse.  We may not see as homogenous world in the future that we had say 10-15 years ago when MPI + C/FORTRAN was dominant and supported well (with some exceptions) everywhere, but I think we could see some coalescence around MPI + OpenMP + ? + C/F.
> 
> Anecdotally, I can just say that I've had several discussion about potential development with PETSc where I've had to say "well, we can just add threads to PETSc ourselves, I know where the loops are".  As a big PETSc fan this is a bit awkward, it would be nice to say something like "PETSc has some support for threads ...".

   I need  a good student or equivalent to work with to do this; as Mark notes it is not that hard but needs someone focused on it. Anybody have someone available?

   Barry

> 
> Simple OpenMP will not survive for long because of the data locality issues, but perhaps there is a path vis-a-vis Aron's comments on "MPI-like" constructs.
> 
> Anyway, I have only anecdotal experience and no quantitative data on this but OpenMP may get too big to ignore, as C++ has: not by being particularly good but by being used a lot.
> 
> Mark
> 
> On Nov 11, 2010, at 8:34 PM, Barry Smith wrote:
> 
>> 
>> On Nov 11, 2010, at 7:22 PM, Jed Brown wrote:
>> 
>>> On Fri, Nov 12, 2010 at 02:18, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> How do you get adaptive load balancing (across the cores inside a process) if you have OpenMP compiler decide the partitioning/parallelism? This was Bill's point in why not to use OpenMP. For example if you give each core the same amount of work up front they will end not ending at the same time so you have wasted cycles.
>>> 
>>> Hmm, I think this issue is largely subordinate to the memory locality (for the sort of work we usually care about), but the OpenMP could be more dynamic about distributing work.  I.e. this could be an OpenMP implementation or tuning issue, but I don't see it as a fundamental disadvantage of that programming model.  I could be wrong.
>> 
>>  You are probably right, your previous explanation was better.  Here is something related that Bill and I discussed, static load balance has lower overhead while dynamic has more overhead. Static load balancing however will end up with some in-balance. Thus one could do an upfront static load balancing of most of the data then when the first cores run out of their static work they do the rest of the work with the dynamic balancing.
>> 
>>  Barry
>> 
>>> 
>>> Jed
>> 
>> 
>