[petsc-dev] MatPinToCPU

Sat Jul 27 11:53:29 CDT 2019

On Sat, Jul 27, 2019 at 11:39 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   Good catch. Thanks. Maybe the SeqCUDA has the same problem?
>

THis is done  (I may have done it).

Now it seems to me that when you call VecPinToCPU you are setting up and
don't have data, so this copy does not seem necessary. Maybe remove the
copy here:

PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin)
{
  PetscErrorCode ierr;

  PetscFunctionBegin;
  V->pinnedtocpu = pin;
  if (pin) {

*    ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); ????*
or

Not allocate the GPU if it is pinned by added in *a check *here:

PetscErrorCode VecCUDAAllocateCheck(Vec v)
{
  PetscErrorCode ierr;
  cudaError_t    err;
  cudaStream_t   stream;
  Vec_CUDA       *veccuda;

  PetscFunctionBegin;
  if (!v->spptr) {
    ierr = PetscMalloc(sizeof(Vec_CUDA),&v->spptr);CHKERRQ(ierr);
    veccuda = (Vec_CUDA*)v->spptr;
*if (v->valid_GPU_array != PETSC_OFFLOAD_CPU) {*
    err =
cudaMalloc((void**)&veccuda->GPUarray_allocated,sizeof(PetscScalar)*((PetscBLASInt)v->map->n));CHKERRCUDA(err);
    veccuda->GPUarray = veccuda->GPUarray_allocated;
    err = cudaStreamCreate(&stream);CHKERRCUDA(err);
    veccuda->stream = stream;
    veccuda->hostDataRegisteredAsPageLocked = PETSC_FALSE;
    if (v->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) {
      if (v->data && ((Vec_Seq*)v->data)->array) {
        v->valid_GPU_array = PETSC_OFFLOAD_CPU;
      } else {
        v->valid_GPU_array = PETSC_OFFLOAD_GPU;
      }
    }
*}*
  }
  PetscFunctionReturn(0);
}

>
> > On Jul 27, 2019, at 10:40 AM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > Yea, I just figured out the problem. VecDuplicate_MPICUDA did not call
> PinToCPU or even copy pinnedtocpu. It just copied ops, so I added and am
> testing:
> >
> >   ierr =
> VecCreate_MPICUDA_Private(*v,PETSC_TRUE,w->nghost,0);CHKERRQ(ierr);
> >   vw   = (Vec_MPI*)(*v)->data;
> >   ierr = PetscMemcpy((*v)->ops,win->ops,sizeof(struct
> _VecOps));CHKERRQ(ierr);
> >   ierr = VecPinToCPU(*v,win->pinnedtocpu);CHKERRQ(ierr);
> >
> > Thanks,
> >
> > On Sat, Jul 27, 2019 at 11:33 AM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >   I don't understand the context. Once a vector is pinned to the CPU the
> flag should be PETSC_OFFLOAD_CPU permanently until the pin to cpu is turned
> off.  Do you have a pinned vector that has the value PETSC_OFFLOAD_GPU?
> For example here it is set to PETSC_OFFLOAD_CPU
> >
> > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin)
> > {
> > ....
> >   if (pin) {
> >     ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr);
> >     V->valid_GPU_array = PETSC_OFFLOAD_CPU; /* since the CPU code will
> likely change values in the vector */
> >
> >
> >   Is there any way to reproduce the problem?
> >
> >   Barry
> >
> >
> >
> >
> > > On Jul 27, 2019, at 10:28 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > > I'm not sure what to do here. The problem is that pinned-to-cpu
> vectors are calling VecCUDACopyFromGPU here.
> > >
> > > Should I set x->valid_GPU_array to something else, like
> PETSC_OFFLOAD_CPU, in PinToCPU so this block of code i s not executed?
> > >
> > > PetscErrorCode VecGetArray(Vec x,PetscScalar **a)
> > > {
> > >   PetscErrorCode ierr;
> > > #if defined(PETSC_HAVE_VIENNACL)
> > >   PetscBool      is_viennacltype = PETSC_FALSE;
> > > #endif
> > >
> > >   PetscFunctionBegin;
> > >   PetscValidHeaderSpecific(x,VEC_CLASSID,1);
> > >   ierr = VecSetErrorIfLocked(x,1);CHKERRQ(ierr);
> > >   if (x->petscnative) {
> > > #if defined(PETSC_HAVE_VIENNACL) || defined(PETSC_HAVE_CUDA)
> > >     if (x->valid_GPU_array == PETSC_OFFLOAD_GPU) {
> > > #if defined(PETSC_HAVE_VIENNACL)
> > >       ierr =
> PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);
> > >       if (is_viennacltype) {
> > >         ierr = VecViennaCLCopyFromGPU(x);CHKERRQ(ierr);
> > >       } else
> > > #endif
> > >       {
> > > #if defined(PETSC_HAVE_CUDA)
> > >         ierr = VecCUDACopyFromGPU(x);CHKERRQ(ierr);
> > > #endif
> > >      }
> > >     } else if (x->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) {
> > > #if defined(PETSC_HAVE_VIENNACL)
> > >       ierr =
> PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);
> > >       if (is_viennacltype) {
> > >         ierr = VecViennaCLAllocateCheckHost(x);CHKERRQ(ierr);
> > >       } else
> > > #endif
> > >       {
> > > #if defined(PETSC_HAVE_CUDA)
> > >         ierr = VecCUDAAllocateCheckHost(x);CHKERRQ(ierr);
> > > #endif
> > >       }
> > >     }
> > > #endif
> > >     *a = *((PetscScalar**)x->data);
> > >   } else {
> > >
> > >
> > > On Tue, Jul 23, 2019 at 9:18 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > >
> > >  Yes, it needs to be able to switch back and forth between the CPU and
> GPU methods so you need to move into it the setting of the methods that is
> currently directly in the create method. See how
> MatConvert_SeqAIJ_SeqAIJViennaCL() calls ierr =
> MatPinToCPU_SeqAIJViennaCL(A,PETSC_FALSE);CHKERRQ(ierr); to set the methods
> for the GPU initially.
> > >
> > >   Barry
> > >
> > >
> > > > On Jul 23, 2019, at 7:32 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > > >
> > > >
> > > >   What are the symptoms of it not working? Does it appear to be
> still copying the matrices to the GPU? then running the functions on the
> GPU?
> > > >
> > > >
> > > > The object is dispatching the CUDA mat-vec etc.
> > > >
> > > >   I suspect the pinning is incompletely done for CUDA (and
> MPIOpenCL) matrices.
> > > >
> > > >
> > > > Yes, git grep MatPinToCPU shows stuff for ViennaCL but not CUDA.
> > > >
> > > > I guess I can add something like this below. Do we need to set the
> device methods? They are already set when this method is set, right?
> > > >
> > > > We need the equivalent of
> > > >
> > > > static PetscErrorCode MatPinToCPU_SeqAIJViennaCL(Mat A,PetscBool flg)
> > > > {
> > > >   PetscFunctionBegin;
> > > >   A->pinnedtocpu = flg;
> > > >   if (flg) {
> > > >     A->ops->mult           = MatMult_SeqAIJ;
> > > >     A->ops->multadd        = MatMultAdd_SeqAIJ;
> > > >     A->ops->assemblyend    = MatAssemblyEnd_SeqAIJ;
> > > >     A->ops->duplicate      = MatDuplicate_SeqAIJ;
> > > >   } else {
> > > >     A->ops->mult           = MatMult_SeqAIJViennaCL;
> > > >     A->ops->multadd        = MatMultAdd_SeqAIJViennaCL;
> > > >     A->ops->assemblyend    = MatAssemblyEnd_SeqAIJViennaCL;
> > > >     A->ops->destroy        = MatDestroy_SeqAIJViennaCL;
> > > >     A->ops->duplicate      = MatDuplicate_SeqAIJViennaCL;
> > > >   }
> > > >   PetscFunctionReturn(0);
> > > > }
> > > >
> > > > for MPIViennaCL and MPISeqAIJ Cusparse but it doesn't look like it
> has been written yet.
> > > >
> > > >
> > > > >
> > > > > It does not seem to work. It does not look like CUDA has an
> MatCreateVecs. Should I add one and copy this flag over?
> > > >
> > > >    We do need this function. But I don't see how it relates to
> pinning. When the matrix is pinned to the CPU we want it to create CPU
> vectors which I assume it does.
> > > >
> > > >
> > > > >
> > > > > Mark
> > > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190727/7479d6ae/attachment.html>