[petsc-dev] MatPinToCPU
Mark Adams
mfadams at lbl.gov
Sun Jul 28 11:27:53 CDT 2019
This is looking good. I'm not seeing the numerical problems, but I've just
hid them by avoiding the GPU on coarse grids.
Should I submit a pull request now or test more or wait for Karl?
On Sat, Jul 27, 2019 at 7:37 PM Mark Adams <mfadams at lbl.gov> wrote:
> Barry, I fixed CUDA to pin to CPUs correctly for GAMG at least. There are
> some hacks here that we can work on.
>
> I will start testing it tomorrow, but I am pretty sure that I have not
> regressed. I am hoping that this will fix the numerical problems, which
> seem to be associated with empty processors.
>
> I did need to touch code outside of GAMG and CUDA. It might be nice to
> test this in a next.
>
> GAMG now puts all reduced processorg grids on the CPU. This could be
> looked at in the future.
>
>
> On Sat, Jul 27, 2019 at 1:00 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
>
>>
>>
>> > On Jul 27, 2019, at 11:53 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> >
>> >
>> > On Sat, Jul 27, 2019 at 11:39 AM Smith, Barry F. <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> > Good catch. Thanks. Maybe the SeqCUDA has the same problem?
>> >
>> > THis is done (I may have done it).
>> >
>> > Now it seems to me that when you call VecPinToCPU you are setting up
>> and don't have data, so this copy does not seem necessary. Maybe remove the
>> copy here:
>> >
>> > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin)
>> > {
>> > PetscErrorCode ierr;
>> >
>> > PetscFunctionBegin;
>> > V->pinnedtocpu = pin;
>> > if (pin) {
>> > ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); ????
>>
>> The copy from GPU should actually only do anything if the GPU already
>> has data and PETSC_OFFLOAD_GPU. If the GPU does not have data
>> the copy doesn't do anything. When one calls VecPinToCPU() one doesn't
>> know where the data is so the call must be made, but it may do nothing
>>
>> Note that VecCUDACopyFromGPU() calls VecCUDAAllocateCheckHost() not
>> VecCUDAAllocateCheck() so the GPU will not allocate space,
>> VecCUDAAllocateCheck() is called from VecCUDACopyToGPU().
>>
>> Yes, perhaps the naming could be more consistent:
>>
>> 1) in one place it is Host in an other place it is nothing
>> 2) some places it is Host, Device, some places GPU,CPU
>>
>> Perhaps Karl can make these all consistent and simpler in his
>> refactorization
>>
>>
>> Barry
>>
>>
>> >
>> > or
>> >
>> > Not allocate the GPU if it is pinned by added in a check here:
>> >
>> > PetscErrorCode VecCUDAAllocateCheck(Vec v)
>> > {
>> > PetscErrorCode ierr;
>> > cudaError_t err;
>> > cudaStream_t stream;
>> > Vec_CUDA *veccuda;
>> >
>> > PetscFunctionBegin;
>> > if (!v->spptr) {
>> > ierr = PetscMalloc(sizeof(Vec_CUDA),&v->spptr);CHKERRQ(ierr);
>> > veccuda = (Vec_CUDA*)v->spptr;
>> > if (v->valid_GPU_array != PETSC_OFFLOAD_CPU) {
>> > err =
>> cudaMalloc((void**)&veccuda->GPUarray_allocated,sizeof(PetscScalar)*((PetscBLASInt)v->map->n));CHKERRCUDA(err);
>> > veccuda->GPUarray = veccuda->GPUarray_allocated;
>> > err = cudaStreamCreate(&stream);CHKERRCUDA(err);
>> > veccuda->stream = stream;
>> > veccuda->hostDataRegisteredAsPageLocked = PETSC_FALSE;
>> > if (v->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) {
>> > if (v->data && ((Vec_Seq*)v->data)->array) {
>> > v->valid_GPU_array = PETSC_OFFLOAD_CPU;
>> > } else {
>> > v->valid_GPU_array = PETSC_OFFLOAD_GPU;
>> > }
>> > }
>> > }
>> > }
>> > PetscFunctionReturn(0);
>> > }
>> >
>> >
>> >
>> >
>> >
>> > > On Jul 27, 2019, at 10:40 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> > >
>> > > Yea, I just figured out the problem. VecDuplicate_MPICUDA did not
>> call PinToCPU or even copy pinnedtocpu. It just copied ops, so I added and
>> am testing:
>> > >
>> > > ierr =
>> VecCreate_MPICUDA_Private(*v,PETSC_TRUE,w->nghost,0);CHKERRQ(ierr);
>> > > vw = (Vec_MPI*)(*v)->data;
>> > > ierr = PetscMemcpy((*v)->ops,win->ops,sizeof(struct
>> _VecOps));CHKERRQ(ierr);
>> > > ierr = VecPinToCPU(*v,win->pinnedtocpu);CHKERRQ(ierr);
>> > >
>> > > Thanks,
>> > >
>> > > On Sat, Jul 27, 2019 at 11:33 AM Smith, Barry F. <bsmith at mcs.anl.gov>
>> wrote:
>> > >
>> > > I don't understand the context. Once a vector is pinned to the CPU
>> the flag should be PETSC_OFFLOAD_CPU permanently until the pin to cpu is
>> turned off. Do you have a pinned vector that has the value
>> PETSC_OFFLOAD_GPU? For example here it is set to PETSC_OFFLOAD_CPU
>> > >
>> > > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin)
>> > > {
>> > > ....
>> > > if (pin) {
>> > > ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr);
>> > > V->valid_GPU_array = PETSC_OFFLOAD_CPU; /* since the CPU code
>> will likely change values in the vector */
>> > >
>> > >
>> > > Is there any way to reproduce the problem?
>> > >
>> > > Barry
>> > >
>> > >
>> > >
>> > >
>> > > > On Jul 27, 2019, at 10:28 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> > > >
>> > > > I'm not sure what to do here. The problem is that pinned-to-cpu
>> vectors are calling VecCUDACopyFromGPU here.
>> > > >
>> > > > Should I set x->valid_GPU_array to something else, like
>> PETSC_OFFLOAD_CPU, in PinToCPU so this block of code i s not executed?
>> > > >
>> > > > PetscErrorCode VecGetArray(Vec x,PetscScalar **a)
>> > > > {
>> > > > PetscErrorCode ierr;
>> > > > #if defined(PETSC_HAVE_VIENNACL)
>> > > > PetscBool is_viennacltype = PETSC_FALSE;
>> > > > #endif
>> > > >
>> > > > PetscFunctionBegin;
>> > > > PetscValidHeaderSpecific(x,VEC_CLASSID,1);
>> > > > ierr = VecSetErrorIfLocked(x,1);CHKERRQ(ierr);
>> > > > if (x->petscnative) {
>> > > > #if defined(PETSC_HAVE_VIENNACL) || defined(PETSC_HAVE_CUDA)
>> > > > if (x->valid_GPU_array == PETSC_OFFLOAD_GPU) {
>> > > > #if defined(PETSC_HAVE_VIENNACL)
>> > > > ierr =
>> PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);
>> > > > if (is_viennacltype) {
>> > > > ierr = VecViennaCLCopyFromGPU(x);CHKERRQ(ierr);
>> > > > } else
>> > > > #endif
>> > > > {
>> > > > #if defined(PETSC_HAVE_CUDA)
>> > > > ierr = VecCUDACopyFromGPU(x);CHKERRQ(ierr);
>> > > > #endif
>> > > > }
>> > > > } else if (x->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) {
>> > > > #if defined(PETSC_HAVE_VIENNACL)
>> > > > ierr =
>> PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);
>> > > > if (is_viennacltype) {
>> > > > ierr = VecViennaCLAllocateCheckHost(x);CHKERRQ(ierr);
>> > > > } else
>> > > > #endif
>> > > > {
>> > > > #if defined(PETSC_HAVE_CUDA)
>> > > > ierr = VecCUDAAllocateCheckHost(x);CHKERRQ(ierr);
>> > > > #endif
>> > > > }
>> > > > }
>> > > > #endif
>> > > > *a = *((PetscScalar**)x->data);
>> > > > } else {
>> > > >
>> > > >
>> > > > On Tue, Jul 23, 2019 at 9:18 PM Smith, Barry F. <bsmith at mcs.anl.gov>
>> wrote:
>> > > >
>> > > > Yes, it needs to be able to switch back and forth between the CPU
>> and GPU methods so you need to move into it the setting of the methods that
>> is currently directly in the create method. See how
>> MatConvert_SeqAIJ_SeqAIJViennaCL() calls ierr =
>> MatPinToCPU_SeqAIJViennaCL(A,PETSC_FALSE);CHKERRQ(ierr); to set the methods
>> for the GPU initially.
>> > > >
>> > > > Barry
>> > > >
>> > > >
>> > > > > On Jul 23, 2019, at 7:32 PM, Mark Adams <mfadams at lbl.gov> wrote:
>> > > > >
>> > > > >
>> > > > > What are the symptoms of it not working? Does it appear to be
>> still copying the matrices to the GPU? then running the functions on the
>> GPU?
>> > > > >
>> > > > >
>> > > > > The object is dispatching the CUDA mat-vec etc.
>> > > > >
>> > > > > I suspect the pinning is incompletely done for CUDA (and
>> MPIOpenCL) matrices.
>> > > > >
>> > > > >
>> > > > > Yes, git grep MatPinToCPU shows stuff for ViennaCL but not CUDA.
>> > > > >
>> > > > > I guess I can add something like this below. Do we need to set
>> the device methods? They are already set when this method is set, right?
>> > > > >
>> > > > > We need the equivalent of
>> > > > >
>> > > > > static PetscErrorCode MatPinToCPU_SeqAIJViennaCL(Mat A,PetscBool
>> flg)
>> > > > > {
>> > > > > PetscFunctionBegin;
>> > > > > A->pinnedtocpu = flg;
>> > > > > if (flg) {
>> > > > > A->ops->mult = MatMult_SeqAIJ;
>> > > > > A->ops->multadd = MatMultAdd_SeqAIJ;
>> > > > > A->ops->assemblyend = MatAssemblyEnd_SeqAIJ;
>> > > > > A->ops->duplicate = MatDuplicate_SeqAIJ;
>> > > > > } else {
>> > > > > A->ops->mult = MatMult_SeqAIJViennaCL;
>> > > > > A->ops->multadd = MatMultAdd_SeqAIJViennaCL;
>> > > > > A->ops->assemblyend = MatAssemblyEnd_SeqAIJViennaCL;
>> > > > > A->ops->destroy = MatDestroy_SeqAIJViennaCL;
>> > > > > A->ops->duplicate = MatDuplicate_SeqAIJViennaCL;
>> > > > > }
>> > > > > PetscFunctionReturn(0);
>> > > > > }
>> > > > >
>> > > > > for MPIViennaCL and MPISeqAIJ Cusparse but it doesn't look like
>> it has been written yet.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > It does not seem to work. It does not look like CUDA has an
>> MatCreateVecs. Should I add one and copy this flag over?
>> > > > >
>> > > > > We do need this function. But I don't see how it relates to
>> pinning. When the matrix is pinned to the CPU we want it to create CPU
>> vectors which I assume it does.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > Mark
>> > > > >
>> > > >
>> > >
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190728/4530d6ea/attachment-0001.html>
More information about the petsc-dev
mailing list