[petsc-users] Pipelined CG (or Gropp's CG) and communication overlap

Mon Mar 17 04:36:59 CDT 2014

Jed,

Thanks ... aren't you sleep in the night? ;-)

Chao

> Chao Yang <chao.yang at Colorado.EDU> writes:
> 
>> The pipelined CG (or Gropp's CG) recently implemented in PETSc is very
>> attractive since it has the ability of hiding the collective
>> communication in vector dot product by overlapping it with the
>> application of preconditioner and/or SpMV.
>> 
>> However, there is an issue that may seriously degrade the
>> performance. In the pipelined CG, the asynchronous MPI_Iallreduce is
>> called before the application of preconditioner and/or SpMV, and then
>> ended by MPI_Wait. In the application of preconditioner and/or SpMV,
>> communication may also be required (such as halo updating), which I
>> find is often slowed down by the unfinished MPI_Iallreduce in the
>> background.
>> 
>> As far as I know, the current MPI doesn't provide prioritized
>> communication. 
> 
> No, and there is not much interest in adding it because it adds
> complication and tends to create starvation situations in which raising
> the priority actually makes it slower.
> 
>> Therefore, it's highly possible that the performance of the pipelined
>> CG may be even worse than a classic one due to the slowdown of
>> preconditioner and SpMV. Is there a way to avoid this?
> 
> This is an MPI quality-of-implementation issue and there isn't much we
> can do about it.  There may be MPI tuning parameters that can help, but
> the nature of these methods is that in exchange for creating
> latency-tolerance in the reduction, it now overlaps the neighbor
> communication in MatMult/PCApply.