[petsc-users] GPU questions

Wed Nov 17 11:41:11 CST 2010

On Nov 17, 2010, at 9:48 AM, SUN Chun wrote:

> Hi PETSc developers,
> 
> I have some questions regarding GPGPU support in PETSc. Sorry if these questions are redundant, I didn't browse the dev code too carefully...
> 
> 1.  The only example I can find is tutorial/ex47. Is there a plan to provide more examples involving KSP with, say, simple Jacobi PC? I thought KSP is supported but PC is not.

   Any example that uses DAGetMatrix(), DAGetGlobalVector() or DMMG can be run with -da_vec_type cuda -da_mat_type aijcuda to use the GPUS  

    The whole idea behind PETSc is to have the SAME code for different solvers and different systems so we will not have a bunch of examples just for GPUs.

> 
> 2. Would you please comment on the difficulty in supporting PCs? Like ILU, SSOR.... Would you please also comment on the difficulty in supporting external libraries such as ML?

   We don't have code for triangular solves on the GPU, without those ILU and SSOR cannot run on GPUs. Once someone provides triangular solves for GPUs we can add there use and put ILU and SSOR onto the GPUs with PETSc. Regarding ML that is totally up their developers. Note that Nvidia has a smooth agglomaration algorithm for symmetric problems in CUSP that you can access via PETSc as the PCSACUDA PC (not yet stable, so possibly bugs).

> 
> 3. I noticed (might be wrong), MatMult in CUDA is implemented in such a way that we copy the lhs and rhs to GPU each time before we do MatVec. I understand that you may have to do this to ensure MatMult being robust, but I'm worried about performance. Is it possible, say, like in KSP, we keep the lhs and intermediate results on the GPU side? 

    Look at the code more closes. VecCUDACopyToGPU() (also the MatCUDACopy...) ONLY copy down if the values are NOT already on the GPU.  This means once the vectors are on the GPU they remain there and are NOT copied back and forth for each multiply.

      ierr = MatCUDACopyToGPU(A);CHKERRQ(ierr);
  ierr = VecCUDACopyToGPU(xx);CHKERRQ(ierr);
  ierr = VecCUDAAllocateCheck(yy);CHKERRQ(ierr);
  if (usecprow){ /* use compressed row format */
    try {
      cusp::multiply(*cudastruct->mat,*((Vec_CUDA *)xx->spptr)->GPUarray,*cudastruct->tempvec);
      ierr = VecSet_SeqCUDA(yy,0.0);CHKERRQ(ierr);
      thrust::copy(cudastruct->tempvec->begin(),cudastruct->tempvec->end(),thrust::make_permutation_iterator(((Vec_CUDA *)yy->spptr)->GPUarray->begin(),cudastruct->indices->begin()));
    } catch (char* ex) {
      SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"CUDA error: %s", ex);
    }
  } else { /* do not use compressed row format */
    try {
      cusp::multiply(*cudastruct->mat,*((Vec_CUDA *)xx->spptr)->GPUarray,*((Vec_CUDA *)yy->spptr)->GPUarray);
    } catch(char* ex) {
      SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"CUDA error: %s", ex);
    } 
  }
  yy->valid_GPU_array = PETSC_CUDA_GPU;
  ierr = WaitForGPU();CHKERRCUDA(ierr);

> 
> 4. Any performance data on Tesla/Fermi card? I saw from the webpage  you only have Tesla card?

    We don't have good hardware for running benchmarks. The performance is better than just running on the CPU, that is about all I can say.

> 
> 5. Is there a roadmap, a plan, a timeline..., regarding PETSc and nVidia's collaboration for a final fully GPGPU compatible PETSc?

   What do you mean by final fully GPGPU compatible PETSc? Now some things (vector operations, matrix vector products, Krylov solvers) are fully done on the GPUs. Others are automatically done on the CPU. I imagine it will always be this way, I double ever that EVERYTHING will be done on the GPU but that is ok so long as most things are done on the GPU.  If you run with -log_summary it will tell how how many copy to GPUs and copy from GPUs are done in the run and how much time they take. Obviously one wants that number as low as possible.

  Barry

> 
> 
> Thanks a lot for your time!
> Chun
> 
> 
> This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged.  If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email.For other languages, go to http://www.3ds.com/terms/email-disclaimer.