[petsc-users] What is the right way to use split mode asynchronous reduction?

Matthew Knepley knepley at gmail.com
Mon Sep 14 07:39:45 CDT 2020


On Mon, Sep 14, 2020 at 8:27 AM <teivml at gmail.com> wrote:

> Dear Petsc users,
>
> I would like to confirm that the asynchronous calculation of the vector
> norm is faster than the synchronous calculation with the following code.
>
>
> PetscLogDouble tt1,tt2;
> ierr = VecSet(c,one);
> ierr = VecSet(u,one);
> ierr = VecSet(b,one);
>
> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp); CHKERRQ(ierr);
> ierr = KSP_MatMult(ksp,A,x,Ax); CHKERRQ(ierr);
>
>
> ierr = PetscTime(&tt1);CHKERRQ(ierr);
>
> ierr = VecNormBegin(u,NORM_2,&norm1);
> ierr = PetscCommSplitReductionBegin(PetscObjectComm((PetscObject)Ax));
> ierr = KSP_MatMult(ksp,A,c,Ac);
> ierr = VecNormEnd(u,NORM_2,&norm1);
>
>
>  ierr = PetscTime(&tt2);CHKERRQ(ierr);
>  ierr = PetscPrintf(PETSC_COMM_WORLD, "The time used for the asynchronous
> calculation: %f\n",tt2-tt1); CHKERRQ(ierr);
> ierr = PetscPrintf(PETSC_COMM_WORLD,"+ |u| =  %g\n",(double) norm1);
> CHKERRQ(ierr);
>
>
>  ierr = PetscTime(&tt1);CHKERRQ(ierr);
> ierr = VecNorm(b,NORM_2,&norm2); CHKERRQ(ierr);
> ierr = KSP_MatMult(ksp,A,c,Ac);
>
>
> ierr = PetscTime(&tt2);CHKERRQ(ierr);
> ierr = PetscPrintf(PETSC_COMM_WORLD, "The time used for the synchronous
> calculation: %f\n",tt2-tt1); CHKERRQ(ierr);
> ierr = PetscPrintf(PETSC_COMM_WORLD,"+ |b| =  %g\n",(double) norm2);
> CHKERRQ(ierr);
>
>
> This code computes a matrix-vector product and a vector norm
> asynchronously and synchronously.
>
> The calculation is carried out on a single node PC with a Xeon CPU.
> The result of the code above shows that the synchronous calculation is
> faster than the asynchronous calculation. The MPI library is MPICH 3.3 and
> the parallel number is n = 20.
>
> The time used for the asynchronous calculation: 0.001622
> + |u| =  100.
> The time used for the synchronous calculation: 0.000062
> + |b| =  100.
>
>
> Is there anything I should consider in order to properly take advantage of
> the Petsc's asynchronous progress?
>

There is overhead in the asynchronous calculation. In order to see
improvement, you would have to be running
an example for which communication time was larger (hopefully
significantly) than this overhead. Second, if the
computation is perfectly load balanced, this also makes it harder to see
improvement for reducing synchronizations.
A single node is unlikely to benefit from any of this stuff.

  Thanks,

     Matt


> Thank you for any help you can provide.
> Sincerely,
> Teiv.
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200914/4341d07a/attachment.html>


More information about the petsc-users mailing list