[petsc-dev] Implementing longer pipelines with VecDotBegin and VecDotEnd

Jed Brown jedbrown at mcs.anl.gov
Thu Mar 23 22:21:55 CDT 2017


Barry Smith <bsmith at mcs.anl.gov> writes:

>> On Mar 23, 2017, at 6:05 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>> 
>> Barry Smith <bsmith at mcs.anl.gov> writes:
>> 
>>>  Wim,
>>> 
>>>    VecDotBegin/End() work by accumulating the partial values in a data structure associated with the MPI communicator until a PetscCommSplitReductionBegin() (or an VecXXXEnd()) is seen. Thus in the current model only a single collection of reductions can be outstanding at the same time. 
>>> 
>>>   For your needs we will need to extend the functionality so there can be multiple independent sets of outstanding reductions. 
>> 
>> Instead of this integer, I would prefer to change
>> PetscSplitReductionGet() to give a request object that can be completed.
>> If it is necessary to be able to start a new norm or dot product with
>> the same arguments before completing the last, then
>> 
>>  VecNormBegin(X,&request);
>>  VecNormEnd(X,request,&nrm);
>> 
>> The request above could be a pointer or an integer.
>
>    Jed, how would you handle the chaining of several reductions into a single MPI communication? I don't think would work, you'd need a wider API for example
>
>     VecNormBegin(X,&request);
>     VecNormBeginWithRequest(Y,request);
>     VecNormEnd(X,request,&nrm);
>     VecNormEnd(Y,request,&nrm2);
>
>    Ugly.
>
>     Less ugly you could have something like
>
>      PetscSplitReductionGetRequest(MPI_Comm,&request);
>      VecNormBegin(X,request);
>     VecNormBegin(Y,request);
>     VecNormEnd(X,request,&nrm);
>     VecNormEnd(Y,request,&nrm2);
>      PetscSplitReductionRestoreRequest(MPI_Comm,&request);

Meh,

  VecNormBegin(X,&request1x);
  VecNormBegin(Y,&request1y);
  VecNormEnd(X,request1x,&norm);
  VecAXPY(Y,-1,X);
  VecNormBegin(Y,&request2y);
  VecNormEnd(Y,request2y,&norm2y);
  VecNormEnd(Y,request1y,&norm1y);

>   and to interleave multiple reductions 
>
>      PetscSplitReductionGetRequest(MPI_Comm,&request);
>      VecNormBegin(X,request);
>     VecNormBegin(Y,request);
>
>      PetscSplitReductionGetRequest(MPI_Comm,&request2);
>      VecNormBegin(Z,request2);
>
>     VecNormEnd(X,request,&nrm);
>     VecNormEnd(Y,request,&nrm2);
>      PetscSplitReductionRestoreRequest(MPI_Comm,&request);
>      ....   
>     VecNormEnd(Z,request2,&nrm3);
>      PetscSplitReductionRestoreRequest(MPI_Comm,&request2);
>
>    This is like my "integer" model except, as I said initially, we "hoist" the PetscSplitReduction object (the request) into visible space. Same functionality (using integer or hoisted object), just a different style. You could argue the hoisted style is more PETSc-like and we should use it.
>
>    Note that with the hoisted model one would not need to attach the
>    split information to the MPI_Comm as is currently done, but if the
>    begin and end are in different routines one must carry the hoisted
>    request variable around to get it to the correct final location. 

I think passing the request around is necessary if you need the result
of a norm that was started in a different function (and probably good to
pass something anyway just for documentation value).

An alternative is that future norm is associated with the Vec's "state",
but this gets complicated for dot products (two vectors).

>    Of course tracking the integer around is a bit to much like
>    hardwiring integer values for MPI tags (very dangerous) so hoisting
>    is the way to go?

Hardwired integers are gross.

>    Barry
>
>> 
>>>   Jed will likely have better ideas on but the simplest extension I can see is to add an additional integer argument to each call that indicates the sub collection of reductions. Thus something like
>>> 
>>> ierr = VecDotBegin(R,U,&gamma,0); CHKERRQ(ierr);
>>> 
>>> ierr = KSP_MatMult(ksp,Amat, ..., ... ); CHKERRQ(ierr);
>>> 
>>> ierr = VecDotBegin(W,V,&delta,1); CHKERRQ(ierr);
>>> 
>>> ierr = KSP_MatMult(ksp,Amat,M,N); CHKERRQ(ierr);
>>> 
>>> ierr = VecDotEnd(R,U,&gamma,0); CHKERRQ(ierr);
>>> ierr = VecDotBegin(X,Y,&psi,2); CHKERRQ(ierr);
>>> .... 
>>> 
>>> ierr = VecDotEnd(W,V,&delta,1); CHKERRQ(ierr);
>>> ierr = VecDotEnd(X,Y,&psi,2); CHKERRQ(ierr);
>>> 
>>> The integer would be used internally by the routines to access different PetscSplitReduction objects associated with the MPI_Comm. In user code once you have completely Ended an operation with a particular integer you can recycle the integer and use it again for a new set.
>>> 
>>> An alternative to using integers is to hoist the PetscSplitReduction up to be visible to the calling code thus allowing multiple ones associated with different sets of reductions. This approach would result in a larger change to the public API so I would only do it if there is a fatal flaw in the integer approach.
>>> 
>>>  Jed, how do you suggest solving this ?
>>> 
>>>  Barry
>>> 
>>> 
>>> 
>>> 
>>>> On Mar 23, 2017, at 9:41 AM, Wim Vanroose <wim at vanroo.se> wrote:
>>>> 
>>>> Dear  Petsc-Dev, 
>>>> 
>>>> Over the last few year we have contributed several pipelined Krylov solvers.  Such as KSPPIPECG and  most recently pipelined bicgstab (pipebcgs). 
>>>> These make use of asynchronous global reductions using VecDotBegin en VecDotEnd to overlap the calculation of a dot product with the matrix vector product. 
>>>> Experiments by various authors show that these methods can offere better scaling in the extremely large system limit. 
>>>> 
>>>> We are now trying to introduce Krylov methods with longer  pipelines.  Such that the dot-product can take multiple matrix vector products to complete. 
>>>> 
>>>> Below is a scetch.  After the first SpMV we would like to start a VecDotBegin,  That would only complete 2 Spmv's, or more, later. 
>>>> After each SpMV we would start such global reduction. 
>>>> <out (1).png>
>>>> While trying to implement a length-l version of pipelined CG in PETSc, we ran across some trouble with the following type of construction 
>>>> that are representative for the problem abouve.  Let R,U,V,W,X and Y  be KSP work vectors, and gamma, delta and psi are PetscScalar:
>>>> 
>>>> ierr = VecDotBegin(R,U,&gamma); CHKERRQ(ierr);
>>>> 
>>>> ierr = KSP_MatMult(ksp,Amat, ..., ... ); CHKERRQ(ierr);
>>>> 
>>>> ierr = VecDotBegin(W,V,&delta); CHKERRQ(ierr);
>>>> 
>>>> ierr = KSP_MatMult(ksp,Amat,M,N); CHKERRQ(ierr);
>>>> 
>>>> ierr = VecDotEnd(R,U,&gamma); CHKERRQ(ierr);
>>>> ierr = VecDotBegin(X,Y,&psi); CHKERRQ(ierr);
>>>> .... 
>>>> 
>>>> ierr = VecDotEnd(W,V,&delta); CHKERRQ(ierr);
>>>> ierr = VecDotEnd(X,Y,&psi); CHKERRQ(ierr);
>>>> 
>>>> Maybe this is a trivial remark, but it appears that it is not possible to put a new VecDotBegin (line 7) in between two VecDotEnd's (lines 6 and 8). Do you have any ideas on why this can't be done (is it intrinsic to VecDotBegin?), and whether a work-around for this issue is available?
>>>> 
>>>> Are there other methods in Petsc  that we should use?   Or are the VecDotBegin and VecDotEnd not designed to be used in this way?
>>>> 
>>>> Thanks a lot for the input,
>>>>-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170323/55cb013c/attachment.sig>


More information about the petsc-dev mailing list