[petsc-dev] PETSc programming model for multi-core systems

Fri Nov 12 07:32:41 CST 2010

Hi Rodrigo,

These are interesting results.  It looks like you were bound by a
speedup of about 2, which suggests you might have been seeing cache
capacity/conflict problems.  Did you do any further analysis on why
you weren't able to get better performance?

A

On Fri, Nov 12, 2010 at 8:26 AM, Rodrigo R. Paz
<rodrigop at intec.unl.edu.ar> wrote:
> Hi all,
> find attached a plot with some results (speedup) that we have obtained some
> time ago with some hacks we introduced to petsc in order to be used on
> hybrid archs using openmp.
> The tests were done in a set of 6 Xeon nodes with 8 cores each. Results are
> for the MatMult op in KSP in the context of the solution of
> advection-diffusion-reaction eqs by means of SUPG stabilized FEM.
>
> Rodrigo
>
> --
> Rodrigo Paz
> National Council for Scientific Research CONICET
> CIMEC-INTEC-CONICET-UNL.
> Güemes 3450. 3000, Santa Fe, Argentina.
> Tel/Fax: +54-342-4511594, Fax: +54-342-4511169
>
>
> On Thu, Nov 11, 2010 at 10:34 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>> On Nov 11, 2010, at 7:22 PM, Jed Brown wrote:
>>
>> > On Fri, Nov 12, 2010 at 02:18, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> > How do you get adaptive load balancing (across the cores inside a
>> > process) if you have OpenMP compiler decide the partitioning/parallelism?
>> > This was Bill's point in why not to use OpenMP. For example if you give each
>> > core the same amount of work up front they will end not ending at the same
>> > time so you have wasted cycles.
>> >
>> > Hmm, I think this issue is largely subordinate to the memory locality
>> > (for the sort of work we usually care about), but the OpenMP could be more
>> > dynamic about distributing work.  I.e. this could be an OpenMP
>> > implementation or tuning issue, but I don't see it as a fundamental
>> > disadvantage of that programming model.  I could be wrong.
>>
>>   You are probably right, your previous explanation was better.  Here is
>> something related that Bill and I discussed, static load balance has lower
>> overhead while dynamic has more overhead. Static load balancing however will
>> end up with some in-balance. Thus one could do an upfront static load
>> balancing of most of the data then when the first cores run out of their
>> static work they do the rest of the work with the dynamic balancing.
>>
>>   Barry
>>
>> >
>> > Jed
>>
>
>