[petsc-users] Using OpenMP threads with PETSc

Thu Apr 9 18:23:06 CDT 2015

>> Mainly because there is no need to copy buffers between processes
>> (with MPI calls) when it is already fast to use them shared, inside
>> the same NUMA node.
>
> What about cache efficiency when your working set is not contiguous or
> is poorly shaped?

Will be no worse than separated MPI processes competing for the same
cache slot. At least with threads there is a chance that different
tasks will hit the same cached memory. I do believe there should be a
smart way to control and optimize thread work proximity on OMP for
loops, i.e.:

#pragma omp parallel for
for(size_t i = 0; i < size; ++size) {
    // something not dependent on previous steps
}

Each two threads working on ranges closely together should run on the
two hyperthreads of the same core, to maximize cache reuse. Because
this usage pattern of OpenMP, it seems to me that it is already
unlikely that two threads will be working to far off each other, but
if I wanted this level of control, I should now be hacking some OpenMP
implementation and/or the kernel.

On the other hand, pthread interface favors more loosely coupled
tasks, that may yield worse cache reuse, but I confess I didn't take
the time to look inside PETSc how each of the threading libraries was
used.

-- 
Lucas Clemente Vella
lvella at gmail.com