[petsc-dev] MatPinToCPU

Sat Jul 27 18:37:37 CDT 2019

Barry, I fixed CUDA to pin to CPUs correctly for GAMG at least. There are
some hacks here that we can work on.

I will start testing it tomorrow, but I am pretty sure that I have not
regressed. I am hoping that this will fix the numerical problems, which
seem to be associated with empty processors.

I did need to touch code outside of GAMG and CUDA. It might be nice to test
this in a next.

GAMG now puts all reduced processorg grids on the CPU. This could be looked
at in the future.

On Sat, Jul 27, 2019 at 1:00 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>
> > On Jul 27, 2019, at 11:53 AM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> >
> > On Sat, Jul 27, 2019 at 11:39 AM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >   Good catch. Thanks. Maybe the SeqCUDA has the same problem?
> >
> > THis is done  (I may have done it).
> >
> > Now it seems to me that when you call VecPinToCPU you are setting up and
> don't have data, so this copy does not seem necessary. Maybe remove the
> copy here:
> >
> > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin)
> > {
> >   PetscErrorCode ierr;
> >
> >   PetscFunctionBegin;
> >   V->pinnedtocpu = pin;
> >   if (pin) {
> >     ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); ????
>
>    The copy from GPU should actually only do anything if the GPU already
> has data and PETSC_OFFLOAD_GPU. If the GPU does not have data
> the copy doesn't do anything. When one calls VecPinToCPU() one doesn't
> know where the data is so the call must be made, but it may do nothing
>
>   Note that VecCUDACopyFromGPU() calls VecCUDAAllocateCheckHost() not
> VecCUDAAllocateCheck() so the GPU will not allocate space,
> VecCUDAAllocateCheck() is called from VecCUDACopyToGPU().
>
>    Yes, perhaps the naming could be more consistent:
>
> 1) in one place it is Host in an other place it is nothing
> 2) some places it is Host, Device, some places GPU,CPU
>
>    Perhaps Karl can make these all consistent and simpler in his
> refactorization
>
>
>   Barry
>
>
> >
> > or
> >
> > Not allocate the GPU if it is pinned by added in a check here:
> >
> > PetscErrorCode VecCUDAAllocateCheck(Vec v)
> > {
> >   PetscErrorCode ierr;
> >   cudaError_t    err;
> >   cudaStream_t   stream;
> >   Vec_CUDA       *veccuda;
> >
> >   PetscFunctionBegin;
> >   if (!v->spptr) {
> >     ierr = PetscMalloc(sizeof(Vec_CUDA),&v->spptr);CHKERRQ(ierr);
> >     veccuda = (Vec_CUDA*)v->spptr;
> > if (v->valid_GPU_array != PETSC_OFFLOAD_CPU) {
> >     err =
> cudaMalloc((void**)&veccuda->GPUarray_allocated,sizeof(PetscScalar)*((PetscBLASInt)v->map->n));CHKERRCUDA(err);
> >     veccuda->GPUarray = veccuda->GPUarray_allocated;
> >     err = cudaStreamCreate(&stream);CHKERRCUDA(err);
> >     veccuda->stream = stream;
> >     veccuda->hostDataRegisteredAsPageLocked = PETSC_FALSE;
> >     if (v->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) {
> >       if (v->data && ((Vec_Seq*)v->data)->array) {
> >         v->valid_GPU_array = PETSC_OFFLOAD_CPU;
> >       } else {
> >         v->valid_GPU_array = PETSC_OFFLOAD_GPU;
> >       }
> >     }
> > }
> >   }
> >   PetscFunctionReturn(0);
> > }
> >
> >
> >
> >
> >
> > > On Jul 27, 2019, at 10:40 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > > Yea, I just figured out the problem. VecDuplicate_MPICUDA did not call
> PinToCPU or even copy pinnedtocpu. It just copied ops, so I added and am
> testing:
> > >
> > >   ierr =
> VecCreate_MPICUDA_Private(*v,PETSC_TRUE,w->nghost,0);CHKERRQ(ierr);
> > >   vw   = (Vec_MPI*)(*v)->data;
> > >   ierr = PetscMemcpy((*v)->ops,win->ops,sizeof(struct
> _VecOps));CHKERRQ(ierr);
> > >   ierr = VecPinToCPU(*v,win->pinnedtocpu);CHKERRQ(ierr);
> > >
> > > Thanks,
> > >
> > > On Sat, Jul 27, 2019 at 11:33 AM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > >
> > >   I don't understand the context. Once a vector is pinned to the CPU
> the flag should be PETSC_OFFLOAD_CPU permanently until the pin to cpu is
> turned off.  Do you have a pinned vector that has the value
> PETSC_OFFLOAD_GPU?  For example here it is set to PETSC_OFFLOAD_CPU
> > >
> > > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin)
> > > {
> > > ....
> > >   if (pin) {
> > >     ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr);
> > >     V->valid_GPU_array = PETSC_OFFLOAD_CPU; /* since the CPU code will
> likely change values in the vector */
> > >
> > >
> > >   Is there any way to reproduce the problem?
> > >
> > >   Barry
> > >
> > >
> > >
> > >
> > > > On Jul 27, 2019, at 10:28 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > > >
> > > > I'm not sure what to do here. The problem is that pinned-to-cpu
> vectors are calling VecCUDACopyFromGPU here.
> > > >
> > > > Should I set x->valid_GPU_array to something else, like
> PETSC_OFFLOAD_CPU, in PinToCPU so this block of code i s not executed?
> > > >
> > > > PetscErrorCode VecGetArray(Vec x,PetscScalar **a)
> > > > {
> > > >   PetscErrorCode ierr;
> > > > #if defined(PETSC_HAVE_VIENNACL)
> > > >   PetscBool      is_viennacltype = PETSC_FALSE;
> > > > #endif
> > > >
> > > >   PetscFunctionBegin;
> > > >   PetscValidHeaderSpecific(x,VEC_CLASSID,1);
> > > >   ierr = VecSetErrorIfLocked(x,1);CHKERRQ(ierr);
> > > >   if (x->petscnative) {
> > > > #if defined(PETSC_HAVE_VIENNACL) || defined(PETSC_HAVE_CUDA)
> > > >     if (x->valid_GPU_array == PETSC_OFFLOAD_GPU) {
> > > > #if defined(PETSC_HAVE_VIENNACL)
> > > >       ierr =
> PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);
> > > >       if (is_viennacltype) {
> > > >         ierr = VecViennaCLCopyFromGPU(x);CHKERRQ(ierr);
> > > >       } else
> > > > #endif
> > > >       {
> > > > #if defined(PETSC_HAVE_CUDA)
> > > >         ierr = VecCUDACopyFromGPU(x);CHKERRQ(ierr);
> > > > #endif
> > > >      }
> > > >     } else if (x->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) {
> > > > #if defined(PETSC_HAVE_VIENNACL)
> > > >       ierr =
> PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);
> > > >       if (is_viennacltype) {
> > > >         ierr = VecViennaCLAllocateCheckHost(x);CHKERRQ(ierr);
> > > >       } else
> > > > #endif
> > > >       {
> > > > #if defined(PETSC_HAVE_CUDA)
> > > >         ierr = VecCUDAAllocateCheckHost(x);CHKERRQ(ierr);
> > > > #endif
> > > >       }
> > > >     }
> > > > #endif
> > > >     *a = *((PetscScalar**)x->data);
> > > >   } else {
> > > >
> > > >
> > > > On Tue, Jul 23, 2019 at 9:18 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > > >
> > > >  Yes, it needs to be able to switch back and forth between the CPU
> and GPU methods so you need to move into it the setting of the methods that
> is currently directly in the create method. See how
> MatConvert_SeqAIJ_SeqAIJViennaCL() calls ierr =
> MatPinToCPU_SeqAIJViennaCL(A,PETSC_FALSE);CHKERRQ(ierr); to set the methods
> for the GPU initially.
> > > >
> > > >   Barry
> > > >
> > > >
> > > > > On Jul 23, 2019, at 7:32 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > > > >
> > > > >
> > > > >   What are the symptoms of it not working? Does it appear to be
> still copying the matrices to the GPU? then running the functions on the
> GPU?
> > > > >
> > > > >
> > > > > The object is dispatching the CUDA mat-vec etc.
> > > > >
> > > > >   I suspect the pinning is incompletely done for CUDA (and
> MPIOpenCL) matrices.
> > > > >
> > > > >
> > > > > Yes, git grep MatPinToCPU shows stuff for ViennaCL but not CUDA.
> > > > >
> > > > > I guess I can add something like this below. Do we need to set the
> device methods? They are already set when this method is set, right?
> > > > >
> > > > > We need the equivalent of
> > > > >
> > > > > static PetscErrorCode MatPinToCPU_SeqAIJViennaCL(Mat A,PetscBool
> flg)
> > > > > {
> > > > >   PetscFunctionBegin;
> > > > >   A->pinnedtocpu = flg;
> > > > >   if (flg) {
> > > > >     A->ops->mult           = MatMult_SeqAIJ;
> > > > >     A->ops->multadd        = MatMultAdd_SeqAIJ;
> > > > >     A->ops->assemblyend    = MatAssemblyEnd_SeqAIJ;
> > > > >     A->ops->duplicate      = MatDuplicate_SeqAIJ;
> > > > >   } else {
> > > > >     A->ops->mult           = MatMult_SeqAIJViennaCL;
> > > > >     A->ops->multadd        = MatMultAdd_SeqAIJViennaCL;
> > > > >     A->ops->assemblyend    = MatAssemblyEnd_SeqAIJViennaCL;
> > > > >     A->ops->destroy        = MatDestroy_SeqAIJViennaCL;
> > > > >     A->ops->duplicate      = MatDuplicate_SeqAIJViennaCL;
> > > > >   }
> > > > >   PetscFunctionReturn(0);
> > > > > }
> > > > >
> > > > > for MPIViennaCL and MPISeqAIJ Cusparse but it doesn't look like it
> has been written yet.
> > > > >
> > > > >
> > > > > >
> > > > > > It does not seem to work. It does not look like CUDA has an
> MatCreateVecs. Should I add one and copy this flag over?
> > > > >
> > > > >    We do need this function. But I don't see how it relates to
> pinning. When the matrix is pinned to the CPU we want it to create CPU
> vectors which I assume it does.
> > > > >
> > > > >
> > > > > >
> > > > > > Mark
> > > > >
> > > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190727/a8d8e5f7/attachment-0001.html>