[petsc-users] Pipelined CG (or Gropp's CG) and communication overlap
Chao Yang
chao.yang at Colorado.EDU
Mon Mar 17 04:36:59 CDT 2014
Jed,
Thanks ... aren't you sleep in the night? ;-)
Chao
> Chao Yang <chao.yang at Colorado.EDU> writes:
>
>> The pipelined CG (or Gropp's CG) recently implemented in PETSc is very
>> attractive since it has the ability of hiding the collective
>> communication in vector dot product by overlapping it with the
>> application of preconditioner and/or SpMV.
>>
>> However, there is an issue that may seriously degrade the
>> performance. In the pipelined CG, the asynchronous MPI_Iallreduce is
>> called before the application of preconditioner and/or SpMV, and then
>> ended by MPI_Wait. In the application of preconditioner and/or SpMV,
>> communication may also be required (such as halo updating), which I
>> find is often slowed down by the unfinished MPI_Iallreduce in the
>> background.
>>
>> As far as I know, the current MPI doesn't provide prioritized
>> communication.
>
> No, and there is not much interest in adding it because it adds
> complication and tends to create starvation situations in which raising
> the priority actually makes it slower.
>
>> Therefore, it's highly possible that the performance of the pipelined
>> CG may be even worse than a classic one due to the slowdown of
>> preconditioner and SpMV. Is there a way to avoid this?
>
> This is an MPI quality-of-implementation issue and there isn't much we
> can do about it. There may be MPI tuning parameters that can help, but
> the nature of these methods is that in exchange for creating
> latency-tolerance in the reduction, it now overlaps the neighbor
> communication in MatMult/PCApply.
More information about the petsc-users
mailing list