[petsc-users] Using OpenMP threads with PETSc

Thu Apr 9 17:52:02 CDT 2015

2015-04-09 19:33 GMT-03:00 Jed Brown <jed at jedbrown.org>:
> Lucas Clemente Vella <lvella at gmail.com> writes:
>> I suspect the optimal setup is to have one process for each NUMA node,
>> one thread for logical core,
>
> Why?  Are you packing buffers in parallel (extra openmp overhead) or
> serial (Amdahl's law limitations)?  The NIC most likely supports as many
> hardware contexts as cores, so there is a shorter critical path when
> using flat MPI.

Mainly because there is no need to copy buffers between processes
(with MPI calls) when it is already fast to use them shared, inside
the same NUMA node. As for processes running in other NUMA nodes, the
extra copy incurred by MPI pays off with faster memory access. I
believed MPI implementations didn't use the NIC to communicate when
inside the same computer node, but instead used shared memory.

By the way, there is already a precompiler macro to disable CPU
affinity: PETSC_HAVE_SCHED_CPU_SET_T.
How to disable it at configure?

-- 
Lucas Clemente Vella
lvella at gmail.com