[petsc-users] MemCpy (HtoD and DtoH) in Krylov solver

Karl Rupp rupp at iue.tuwien.ac.at
Thu Jul 18 12:14:01 CDT 2019


Hi,

as you can see from the screenshot, the communication is merely for 
scalars from the dot-products and/or norms. These are needed on the host 
for the control flow and convergence checks and is true for any 
iterative solver.

Best regards,
Karli



On 7/18/19 3:11 PM, Xiangdong via petsc-users wrote:
> 
> 
> On Thu, Jul 18, 2019 at 5:11 AM Smith, Barry F. <bsmith at mcs.anl.gov 
> <mailto:bsmith at mcs.anl.gov>> wrote:
> 
> 
>         1) What preconditioner are you using? If any.
> 
> Currently I am using none as I want to understand how gmres works on GPU.
> 
> 
>         2) Where/how are you getting this information about the
>     MemCpy(HtoD) and one call MemCpy(DtoH)? We might like to utilize
>     this same sort of information to plan future optimizations.
> 
> I am using nvprof and nvvp from cuda toolkit. It looks like there are 
> one MemCpy(HtoD) and three MemCpy(DtoH) calls per iteration for np=1 
> case. See the attached snapshots.
> 
>         3) Are you using more than 1 MPI rank?
> 
> 
> I tried both np=1 and np=2. Attached please find snapshots from nvvp for 
> both np=1 and np=2 cases. The figures showing gpu calls with two pure 
> gmres iterations.
> 
> Thanks.
> Xiangdong
> 
> 
>        If you use the master branch (which we highly recommend for
>     anyone using GPUs and PETSc) the -log_view option will log
>     communication between CPU and GPU and display it in the summary
>     table. This is useful for seeing exactly what operations are doing
>     vector communication between the CPU/GPU.
> 
>        We welcome all feedback on the GPUs since it previously has only
>     been lightly used.
> 
>         Barry
> 
> 
>      > On Jul 16, 2019, at 9:05 PM, Xiangdong via petsc-users
>     <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>      >
>      > Hello everyone,
>      >
>      > I am new to petsc gpu and have a simple question.
>      >
>      > When I tried to solve Ax=b where A is MATAIJCUSPARSE and b and x
>     are VECSEQCUDA  with GMRES(or GCR) and pcnone, I found that during
>     each krylov iteration, there are one call MemCpy(HtoD) and one call
>     MemCpy(DtoH). Does that mean the Krylov solve is not 100% on GPU and
>     the solve still needs some work from CPU? What are these MemCpys for
>     during the each iteration?
>      >
>      > Thank you.
>      >
>      > Best,
>      > Xiangdong
> 


More information about the petsc-users mailing list