[petsc-dev] PETSc programming model for multi-core systems

Thu Nov 11 21:21:26 CST 2010

I've got to agree with Mark's last statement on OpenMP: it's not particularly 
good, but it appears to be used a lot.  It seems like a lot of the major codes 
running on Jaguar at NCCS are moving towards OpenMP.

I've always disliked OpenMP because of the various issues already mentioned in 
this thread, but I note that there are several OpenMP folks who having been 
thinking about how to address these issues.  One that comes to mind is Barbara 
Chapman at U. Houston (publications at 
https://sites.google.com/site/ambantesting2/home/publications), whom I saw 
give a seminar at NCCS some time ago.

--Richard

On 11/11/2010 9:24 PM, Mark F. Adams wrote:
> This is a great technical discussion the very vexing question of future 
> programming models.
>
> In addition to these issues there are facts-on-the-ground.  My limited view 
> of this elephant, if you will, is that OpenMP seems to be getting a certain 
> critical mass, for better or worse.  We may not see as homogenous world in 
> the future that we had say 10-15 years ago when MPI + C/FORTRAN was dominant 
> and supported well (with some exceptions) everywhere, but I think we could 
> see some coalescence around MPI + OpenMP + ? + C/F.
>
> Anecdotally, I can just say that I've had several discussion about potential 
> development with PETSc where I've had to say "well, we can just add threads 
> to PETSc ourselves, I know where the loops are".  As a big PETSc fan this is 
> a bit awkward, it would be nice to say something like "PETSc has some 
> support for threads ...".
>
> Simple OpenMP will not survive for long because of the data locality issues, 
> but perhaps there is a path vis-a-vis Aron's comments on "MPI-like" constructs.
>
> Anyway, I have only anecdotal experience and no quantitative data on this 
> but OpenMP may get too big to ignore, as C++ has: not by being particularly 
> good but by being used a lot.
>
> Mark
>
> On Nov 11, 2010, at 8:34 PM, Barry Smith wrote:
>
>>
>> On Nov 11, 2010, at 7:22 PM, Jed Brown wrote:
>>
>>> On Fri, Nov 12, 2010 at 02:18, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> How do you get adaptive load balancing (across the cores inside a process) 
>>> if you have OpenMP compiler decide the partitioning/parallelism? This was 
>>> Bill's point in why not to use OpenMP. For example if you give each core 
>>> the same amount of work up front they will end not ending at the same time 
>>> so you have wasted cycles.
>>>
>>> Hmm, I think this issue is largely subordinate to the memory locality (for 
>>> the sort of work we usually care about), but the OpenMP could be more 
>>> dynamic about distributing work.  I.e. this could be an OpenMP 
>>> implementation or tuning issue, but I don't see it as a fundamental 
>>> disadvantage of that programming model.  I could be wrong.
>>
>>   You are probably right, your previous explanation was better.  Here is 
>> something related that Bill and I discussed, static load balance has lower 
>> overhead while dynamic has more overhead. Static load balancing however 
>> will end up with some in-balance. Thus one could do an upfront static load 
>> balancing of most of the data then when the first cores run out of their 
>> static work they do the rest of the work with the dynamic balancing.
>>
>>   Barry
>>
>>>
>>> Jed
>>
>>

-- 
Richard Tran Mills, Ph.D.            |   E-mail: rmills at climate.ornl.gov
Computational Scientist              |   Phone:  (865) 241-3198
Computational Earth Sciences Group   |   Fax:    (865) 574-0405
Oak Ridge National Laboratory        |   http://climate.ornl.gov/~rmills