[petsc-users] Pipelined CG (or Gropp's CG) and communication overlap

Mon Mar 17 04:12:40 CDT 2014

Hi,

The pipelined CG (or Gropp's CG) recently implemented in PETSc is very attractive since it has the ability of hiding the collective communication in vector dot product by overlapping it with the application of preconditioner and/or SpMV. 

However, there is an issue that may seriously degrade the performance. In the pipelined CG, the asynchronous MPI_Iallreduce is called before the application of preconditioner and/or SpMV, and then ended by MPI_Wait. In the application of preconditioner and/or SpMV, communication may also be required (such as halo updating), which I find is often slowed down by the unfinished MPI_Iallreduce in the background. 

As far as I know, the current MPI doesn't provide prioritized communication. Therefore, it's highly possible that the performance of the pipelined CG may be even worse than a classic one due to the slowdown of preconditioner and SpMV. Is there a way to avoid this?

Any suggestion would be high appreciated. Thanks in advance!

Best wishes,
Chao