[petsc-dev] Hybrid MPI/OpenMP reflections
rupp at mcs.anl.gov
Thu Aug 8 09:14:46 CDT 2013
> We have recently been trying to re-align our OpenMP fork
> (https://bitbucket.org/ggorman/petsc-3.3-omp) with petsc/master. Much of
> our early work has now been superseded by the threadcomm
> implementations. Nevertheless, there are still a few algorithmic
> differences between the two branches:
> 1) Enforcing MPI latency hiding by using task-based spMV:
> If the MPI implementation used does not actually provide truly
> asynchronous communication in hardware, performance can be increased by
> dedicating a single thread to overlapping MPI communication in PETSc.
> However, this is arguably a vendor-specific fix which requires
> significant code changes (ie the parallel section needs to be raised up
> by one level). So perhaps the strategy should be to give guilty vendors
> a hard time rather than messing up the current abstraction.
When using good preconditioners, spMV is essentially never the
bottleneck and hence I don't think a separate communication thread
should be implemented in PETSc. Instead, such a fallback should be part
of a good MPI implementation.
> 2) Nonzero-based thread partitioning:
> Rather than evenly dividing the number of rows among threads, we can
> partition the thread ownership ranges according to the number of
> non-zeros in each row. This balances the work load between threads and
> thus increases strong scalability due to optimised bandwidth
> utilisation. In general, this optimisation should integrate well with
> threadcomms, since it only changes the thread ownership ranges, but it
> does require some structural changes since nnz is currently not passed
> to PetscLayoutSetUp. Any thoughts on whether people regard such a scheme
> as useful would be greatly appreciated.
This is a reasonable optimization, I used a similar strategy for sparse
matrices on the GPU. Others should comment on whether the interface
change to PetscLayoutSetUp is acceptable.
> 3) MatMult_SeqBAIJ not threaded:
> Is there a reason why MatMult has not been threaded for BAIJ matrices,
> or is somebody already working on this? If not, I would like to prepare
> a pull request for this using the same approach as MatMult_SeqAIJ.
To my knowledge, it 'simply hasn't been implemented yet'. A pull request
would be nice, I'm happy to review.
More information about the petsc-dev