<div dir="ltr">I'm not sure what to do here. The problem is that pinned-to-cpu vectors are calling <b>VecCUDACopyFromGPU</b> here.<br><div><br></div><div><div>Should I set <b>x->valid_GPU_array </b>to something else, like PETSC_OFFLOAD_CPU, in PinToCPU so this block of code i s not executed?</div></div><div><br></div><div>PetscErrorCode VecGetArray(Vec x,PetscScalar **a)<br>{<br> PetscErrorCode ierr;<br>#if defined(PETSC_HAVE_VIENNACL)<br> PetscBool is_viennacltype = PETSC_FALSE;<br>#endif<br><br> PetscFunctionBegin;<br> PetscValidHeaderSpecific(x,VEC_CLASSID,1);<br> ierr = VecSetErrorIfLocked(x,1);CHKERRQ(ierr);<br> if (x->petscnative) {<br>#if defined(PETSC_HAVE_VIENNACL) || defined(PETSC_HAVE_CUDA)<br> if (<b>x->valid_GPU_array</b> == PETSC_OFFLOAD_GPU) {<br>#if defined(PETSC_HAVE_VIENNACL)<br> ierr = PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);<br> if (is_viennacltype) {<br> ierr = VecViennaCLCopyFromGPU(x);CHKERRQ(ierr);<br> } else<br>#endif<br> {<br>#if defined(PETSC_HAVE_CUDA)<br><b> ierr = VecCUDACopyFromGPU(x);CHKERRQ(ierr);<br></b>#endif<br></div><div> }<br> } else if (x->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) {<br>#if defined(PETSC_HAVE_VIENNACL)<br> ierr = PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);<br> if (is_viennacltype) {<br> ierr = VecViennaCLAllocateCheckHost(x);CHKERRQ(ierr);<br> } else<br>#endif<br> {<br>#if defined(PETSC_HAVE_CUDA)<br> ierr = VecCUDAAllocateCheckHost(x);CHKERRQ(ierr);<br>#endif<br> }<br> }<br>#endif<br> *a = *((PetscScalar**)x->data);<br> } else {<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 23, 2019 at 9:18 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>
Yes, it needs to be able to switch back and forth between the CPU and GPU methods so you need to move into it the setting of the methods that is currently directly in the create method. See how MatConvert_SeqAIJ_SeqAIJViennaCL() calls ierr = MatPinToCPU_SeqAIJViennaCL(A,PETSC_FALSE);CHKERRQ(ierr); to set the methods for the GPU initially.<br>
<br>
Barry<br>
<br>
<br>
> On Jul 23, 2019, at 7:32 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>
> <br>
> <br>
> What are the symptoms of it not working? Does it appear to be still copying the matrices to the GPU? then running the functions on the GPU?<br>
> <br>
> <br>
> The object is dispatching the CUDA mat-vec etc.<br>
> <br>
> I suspect the pinning is incompletely done for CUDA (and MPIOpenCL) matrices. <br>
> <br>
> <br>
> Yes, git grep MatPinToCPU shows stuff for ViennaCL but not CUDA.<br>
> <br>
> I guess I can add something like this below. Do we need to set the device methods? They are already set when this method is set, right?<br>
> <br>
> We need the equivalent of <br>
> <br>
> static PetscErrorCode MatPinToCPU_SeqAIJViennaCL(Mat A,PetscBool flg)<br>
> {<br>
> PetscFunctionBegin;<br>
> A->pinnedtocpu = flg;<br>
> if (flg) {<br>
> A->ops->mult = MatMult_SeqAIJ;<br>
> A->ops->multadd = MatMultAdd_SeqAIJ;<br>
> A->ops->assemblyend = MatAssemblyEnd_SeqAIJ;<br>
> A->ops->duplicate = MatDuplicate_SeqAIJ;<br>
> } else {<br>
> A->ops->mult = MatMult_SeqAIJViennaCL;<br>
> A->ops->multadd = MatMultAdd_SeqAIJViennaCL;<br>
> A->ops->assemblyend = MatAssemblyEnd_SeqAIJViennaCL;<br>
> A->ops->destroy = MatDestroy_SeqAIJViennaCL;<br>
> A->ops->duplicate = MatDuplicate_SeqAIJViennaCL;<br>
> }<br>
> PetscFunctionReturn(0);<br>
> }<br>
> <br>
> for MPIViennaCL and MPISeqAIJ Cusparse but it doesn't look like it has been written yet. <br>
> <br>
> <br>
> > <br>
> > It does not seem to work. It does not look like CUDA has an MatCreateVecs. Should I add one and copy this flag over?<br>
> <br>
> We do need this function. But I don't see how it relates to pinning. When the matrix is pinned to the CPU we want it to create CPU vectors which I assume it does.<br>
> <br>
> <br>
> > <br>
> > Mark<br>
> <br>
<br>
</blockquote></div>