[petsc-users] MemCpy (HtoD and DtoH) in Krylov solver
Karl Rupp
rupp at iue.tuwien.ac.at
Thu Jul 18 12:14:01 CDT 2019
Hi,
as you can see from the screenshot, the communication is merely for
scalars from the dot-products and/or norms. These are needed on the host
for the control flow and convergence checks and is true for any
iterative solver.
Best regards,
Karli
On 7/18/19 3:11 PM, Xiangdong via petsc-users wrote:
>
>
> On Thu, Jul 18, 2019 at 5:11 AM Smith, Barry F. <bsmith at mcs.anl.gov
> <mailto:bsmith at mcs.anl.gov>> wrote:
>
>
> 1) What preconditioner are you using? If any.
>
> Currently I am using none as I want to understand how gmres works on GPU.
>
>
> 2) Where/how are you getting this information about the
> MemCpy(HtoD) and one call MemCpy(DtoH)? We might like to utilize
> this same sort of information to plan future optimizations.
>
> I am using nvprof and nvvp from cuda toolkit. It looks like there are
> one MemCpy(HtoD) and three MemCpy(DtoH) calls per iteration for np=1
> case. See the attached snapshots.
>
> 3) Are you using more than 1 MPI rank?
>
>
> I tried both np=1 and np=2. Attached please find snapshots from nvvp for
> both np=1 and np=2 cases. The figures showing gpu calls with two pure
> gmres iterations.
>
> Thanks.
> Xiangdong
>
>
> If you use the master branch (which we highly recommend for
> anyone using GPUs and PETSc) the -log_view option will log
> communication between CPU and GPU and display it in the summary
> table. This is useful for seeing exactly what operations are doing
> vector communication between the CPU/GPU.
>
> We welcome all feedback on the GPUs since it previously has only
> been lightly used.
>
> Barry
>
>
> > On Jul 16, 2019, at 9:05 PM, Xiangdong via petsc-users
> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
> >
> > Hello everyone,
> >
> > I am new to petsc gpu and have a simple question.
> >
> > When I tried to solve Ax=b where A is MATAIJCUSPARSE and b and x
> are VECSEQCUDA with GMRES(or GCR) and pcnone, I found that during
> each krylov iteration, there are one call MemCpy(HtoD) and one call
> MemCpy(DtoH). Does that mean the Krylov solve is not 100% on GPU and
> the solve still needs some work from CPU? What are these MemCpys for
> during the each iteration?
> >
> > Thank you.
> >
> > Best,
> > Xiangdong
>
More information about the petsc-users
mailing list