[petsc-dev] Implementing longer pipelines with VecDotBegin and VecDotEnd

Thu Mar 23 13:05:42 CDT 2017

  Wim,

    VecDotBegin/End() work by accumulating the partial values in a data structure associated with the MPI communicator until a PetscCommSplitReductionBegin() (or an VecXXXEnd()) is seen. Thus in the current model only a single collection of reductions can be outstanding at the same time. 

   For your needs we will need to extend the functionality so there can be multiple independent sets of outstanding reductions. 

   Jed will likely have better ideas on but the simplest extension I can see is to add an additional integer argument to each call that indicates the sub collection of reductions. Thus something like

ierr = VecDotBegin(R,U,&gamma,0); CHKERRQ(ierr);

 ierr = KSP_MatMult(ksp,Amat, ..., ... ); CHKERRQ(ierr);

 ierr = VecDotBegin(W,V,&delta,1); CHKERRQ(ierr);

 ierr = KSP_MatMult(ksp,Amat,M,N); CHKERRQ(ierr);

 ierr = VecDotEnd(R,U,&gamma,0); CHKERRQ(ierr);
 ierr = VecDotBegin(X,Y,&psi,2); CHKERRQ(ierr);
.... 

 ierr = VecDotEnd(W,V,&delta,1); CHKERRQ(ierr);
 ierr = VecDotEnd(X,Y,&psi,2); CHKERRQ(ierr);

The integer would be used internally by the routines to access different PetscSplitReduction objects associated with the MPI_Comm. In user code once you have completely Ended an operation with a particular integer you can recycle the integer and use it again for a new set.

An alternative to using integers is to hoist the PetscSplitReduction up to be visible to the calling code thus allowing multiple ones associated with different sets of reductions. This approach would result in a larger change to the public API so I would only do it if there is a fatal flaw in the integer approach.

  Jed, how do you suggest solving this ?

  Barry

> On Mar 23, 2017, at 9:41 AM, Wim Vanroose <wim at vanroo.se> wrote:
> 
> Dear  Petsc-Dev, 
> 
> Over the last few year we have contributed several pipelined Krylov solvers.  Such as KSPPIPECG and  most recently pipelined bicgstab (pipebcgs). 
> These make use of asynchronous global reductions using VecDotBegin en VecDotEnd to overlap the calculation of a dot product with the matrix vector product. 
> Experiments by various authors show that these methods can offere better scaling in the extremely large system limit. 
> 
> We are now trying to introduce Krylov methods with longer  pipelines.  Such that the dot-product can take multiple matrix vector products to complete. 
> 
> Below is a scetch.  After the first SpMV we would like to start a VecDotBegin,  That would only complete 2 Spmv's, or more, later. 
> After each SpMV we would start such global reduction. 
> <out (1).png>
> While trying to implement a length-l version of pipelined CG in PETSc, we ran across some trouble with the following type of construction 
> that are representative for the problem abouve.  Let R,U,V,W,X and Y  be KSP work vectors, and gamma, delta and psi are PetscScalar:
> 
>  ierr = VecDotBegin(R,U,&gamma); CHKERRQ(ierr);
> 
>  ierr = KSP_MatMult(ksp,Amat, ..., ... ); CHKERRQ(ierr);
> 
>  ierr = VecDotBegin(W,V,&delta); CHKERRQ(ierr);
> 
>  ierr = KSP_MatMult(ksp,Amat,M,N); CHKERRQ(ierr);
> 
>  ierr = VecDotEnd(R,U,&gamma); CHKERRQ(ierr);
>  ierr = VecDotBegin(X,Y,&psi); CHKERRQ(ierr);
> .... 
> 
>  ierr = VecDotEnd(W,V,&delta); CHKERRQ(ierr);
>  ierr = VecDotEnd(X,Y,&psi); CHKERRQ(ierr);
> 
> Maybe this is a trivial remark, but it appears that it is not possible to put a new VecDotBegin (line 7) in between two VecDotEnd's (lines 6 and 8). Do you have any ideas on why this can't be done (is it intrinsic to VecDotBegin?), and whether a work-around for this issue is available?
> 
> Are there other methods in Petsc  that we should use?   Or are the VecDotBegin and VecDotEnd not designed to be used in this way?
> 
> Thanks a lot for the input,
>