[petsc-dev] Implementing longer pipelines with VecDotBegin and VecDotEnd

Barry Smith bsmith at mcs.anl.gov
Thu Mar 23 22:37:01 CDT 2017


> On Mar 23, 2017, at 10:21 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> 
> Barry Smith <bsmith at mcs.anl.gov> writes:
> 
>>> On Mar 23, 2017, at 6:05 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>>> 
>>> Barry Smith <bsmith at mcs.anl.gov> writes:
>>> 
>>>> Wim,
>>>> 
>>>>   VecDotBegin/End() work by accumulating the partial values in a data structure associated with the MPI communicator until a PetscCommSplitReductionBegin() (or an VecXXXEnd()) is seen. Thus in the current model only a single collection of reductions can be outstanding at the same time. 
>>>> 
>>>>  For your needs we will need to extend the functionality so there can be multiple independent sets of outstanding reductions. 
>>> 
>>> Instead of this integer, I would prefer to change
>>> PetscSplitReductionGet() to give a request object that can be completed.
>>> If it is necessary to be able to start a new norm or dot product with
>>> the same arguments before completing the last, then
>>> 
>>> VecNormBegin(X,&request);
>>> VecNormEnd(X,request,&nrm);
>>> 
>>> The request above could be a pointer or an integer.
>> 
>>   Jed, how would you handle the chaining of several reductions into a single MPI communication? I don't think would work, you'd need a wider API for example
>> 
>>    VecNormBegin(X,&request);
>>    VecNormBeginWithRequest(Y,request);
>>    VecNormEnd(X,request,&nrm);
>>    VecNormEnd(Y,request,&nrm2);
>> 
>>   Ugly.
>> 
>>    Less ugly you could have something like
>> 
>>     PetscSplitReductionGetRequest(MPI_Comm,&request);
>>     VecNormBegin(X,request);
>>    VecNormBegin(Y,request);
>>    VecNormEnd(X,request,&nrm);
>>    VecNormEnd(Y,request,&nrm2);
>>     PetscSplitReductionRestoreRequest(MPI_Comm,&request);
> 
> Meh,
> 
>  VecNormBegin(X,&request1x);
>  VecNormBegin(Y,&request1y);
>  VecNormEnd(X,request1x,&norm);
>  VecAXPY(Y,-1,X);
>  VecNormBegin(Y,&request2y);
>  VecNormEnd(Y,request2y,&norm2y);
>  VecNormEnd(Y,request1y,&norm1y);

   I don't understand what you are getting at here. You don't seem to be understanding my use case where multiple inner products/norms share the same MPI communication (which was the original reason for VecNormBegin/End) see for example KSPSolve_CR

    Are you somehow (incompetently) saying that the first two VecNorms somehow share the same parallel communication (even though they have different request values) while the third Norm has its own MPI communication. Please explain how this works? Because an End was done somehow the next Begin knows to create an entirely new reduction object that it tracks (while the old reduction is kept around (where?) to complete all the first phase requests?) 

   I am ok with this model if it can be implemented.

> 
> Hardwired integers are gross.

  Agreed. I am not married to that model.

> 
>>   Barry
>> 
>>> 
>>>>  Jed will likely have better ideas on but the simplest extension I can see is to add an additional integer argument to each call that indicates the sub collection of reductions. Thus something like
>>>> 
>>>> ierr = VecDotBegin(R,U,&gamma,0); CHKERRQ(ierr);
>>>> 
>>>> ierr = KSP_MatMult(ksp,Amat, ..., ... ); CHKERRQ(ierr);
>>>> 
>>>> ierr = VecDotBegin(W,V,&delta,1); CHKERRQ(ierr);
>>>> 
>>>> ierr = KSP_MatMult(ksp,Amat,M,N); CHKERRQ(ierr);
>>>> 
>>>> ierr = VecDotEnd(R,U,&gamma,0); CHKERRQ(ierr);
>>>> ierr = VecDotBegin(X,Y,&psi,2); CHKERRQ(ierr);
>>>> .... 
>>>> 
>>>> ierr = VecDotEnd(W,V,&delta,1); CHKERRQ(ierr);
>>>> ierr = VecDotEnd(X,Y,&psi,2); CHKERRQ(ierr);
>>>> 
>>>> The integer would be used internally by the routines to access different PetscSplitReduction objects associated with the MPI_Comm. In user code once you have completely Ended an operation with a particular integer you can recycle the integer and use it again for a new set.
>>>> 
>>>> An alternative to using integers is to hoist the PetscSplitReduction up to be visible to the calling code thus allowing multiple ones associated with different sets of reductions. This approach would result in a larger change to the public API so I would only do it if there is a fatal flaw in the integer approach.
>>>> 
>>>> Jed, how do you suggest solving this ?
>>>> 
>>>> Barry
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Mar 23, 2017, at 9:41 AM, Wim Vanroose <wim at vanroo.se> wrote:
>>>>> 
>>>>> Dear  Petsc-Dev, 
>>>>> 
>>>>> Over the last few year we have contributed several pipelined Krylov solvers.  Such as KSPPIPECG and most recently pipelined bicgstab (pipebcgs). 
>>>>> These make use of asynchronous global reductions using VecDotBegin en VecDotEnd to overlap the calculation of a dot product with the matrix vector product. 
>>>>> Experiments by various authors show that these methods can offere better scaling in the extremely large system limit. 
>>>>> 
>>>>> We are now trying to introduce Krylov methods with longer  pipelines.  Such that the dot-product can take multiple matrix vector products to complete. 
>>>>> 
>>>>> Below is a scetch.  After the first SpMV we would like to start a VecDotBegin,  That would only complete 2 Spmv's, or more, later. 
>>>>> After each SpMV we would start such global reduction. 
>>>>> <out (1).png>
>>>>> While trying to implement a length-l version of pipelined CG in PETSc, we ran across some trouble with the following type of construction 
>>>>> that are representative for the problem abouve.  Let R,U,V,W,X and Y  be KSP work vectors, and gamma, delta and psi are PetscScalar:
>>>>> 
>>>>> ierr = VecDotBegin(R,U,&gamma); CHKERRQ(ierr);
>>>>> 
>>>>> ierr = KSP_MatMult(ksp,Amat, ..., ... ); CHKERRQ(ierr);
>>>>> 
>>>>> ierr = VecDotBegin(W,V,&delta); CHKERRQ(ierr);
>>>>> 
>>>>> ierr = KSP_MatMult(ksp,Amat,M,N); CHKERRQ(ierr);
>>>>> 
>>>>> ierr = VecDotEnd(R,U,&gamma); CHKERRQ(ierr);
>>>>> ierr = VecDotBegin(X,Y,&psi); CHKERRQ(ierr);
>>>>> .... 
>>>>> 
>>>>> ierr = VecDotEnd(W,V,&delta); CHKERRQ(ierr);
>>>>> ierr = VecDotEnd(X,Y,&psi); CHKERRQ(ierr);
>>>>> 
>>>>> Maybe this is a trivial remark, but it appears that it is not possible to put a new VecDotBegin (line 7) in between two VecDotEnd's (lines 6 and 8). Do you have any ideas on why this can't be done (is it intrinsic to VecDotBegin?), and whether a work-around for this issue is available?
>>>>> 
>>>>> Are there other methods in Petsc  that we should use?   Or are the VecDotBegin and VecDotEnd not designed to be used in this way?
>>>>> 
>>>>> Thanks a lot for the input,
>>>>>



More information about the petsc-dev mailing list