<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body>

<div dir="ltr">

<div>I had an MR (already merged to master) that changed the name to v->offloadmask.</div>

<div>But the behavior is not changed. VecCreate_SeqCUDA still allocates on both CPU and GPU. I believe we should allocate on CPU on-demand for VecCUDA.</div>

<div><br>

</div>

<div>

<div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">

<div dir="ltr">--Junchao Zhang</div>

</div>

</div>

<br>

</div>

<br>

<div class="gmail_quote">

<div dir="ltr" class="gmail_attr">On Sun, Oct 13, 2019 at 12:27 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>

</div>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

  Yikes, forget about bit flags and names. <br>

<br>

  Does this behavior make sense? EVERY CUDA vector allocates memory on both GPU and CPU ? Or do I misunderstand the code?<br>

<br>

   This seems fundamentally wrong and is different than before. What about the dozens of work vectors on the GPU (for example for Krylov methods)? There is no reason for them to have memory allocated on the CPU.  In the long run pretty much all the matrices

 and vectors will only reside on the GPU so this seems like a step backwards. Does libaxb do this?

<br>

<br>

<br>

   Barry<br>

<br>

<br>

<br>

<br>

<br>

> On Oct 1, 2019, at 10:24 PM, Zhang, Junchao via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>> wrote:<br>

> <br>

> Stafano recently modified the following code,<br>

> <br>

> <br>

> PetscErrorCode VecCreate_SeqCUDA(Vec V)<br>

> <br>

> {<br>

> <br>

>   PetscErrorCode ierr;<br>

> <br>

> <br>

> <br>

>   PetscFunctionBegin;<br>

> <br>

>   ierr = PetscLayoutSetUp(V->map);CHKERRQ(ierr);<br>

> <br>

>   ierr = VecCUDAAllocateCheck(V);CHKERRQ(ierr);<br>

> <br>

>   ierr = VecCreate_SeqCUDA_Private(V,((Vec_CUDA*)V->spptr)->GPUarray_allocated);CHKERRQ(ierr);<br>

> <br>

>   ierr = VecCUDAAllocateCheckHost(V);CHKERRQ(ierr);<br>

> <br>

>   ierr = VecSet(V,0.0);CHKERRQ(ierr);<br>

> <br>

>   ierr = VecSet_Seq(V,0.0);CHKERRQ(ierr);<br>

> <br>

>   V->valid_GPU_array = PETSC_OFFLOAD_BOTH;<br>

> <br>

>   PetscFunctionReturn(0);<br>

> <br>

> }<br>

> <br>

> <br>

> <br>

> <br>

> That means if one creates an SEQCUDA vector V and then immediately tests if (V->valid_GPU_array<br>

>  == PETSC_OFFLOAD_GPU), the test will fail. That is<br>

> <br>

> counterintuitive.  I think we should have<br>

> <br>

> <br>

> <br>

> <br>

> enum {PETSC_OFFLOAD_UNALLOCATED=0x0,PETSC_OFFLOAD_GPU=0x1,PETSC_OFFLOAD_CPU=0x2,PETSC_OFFLOAD_BOTH=0x3}

<br>

> <br>

> <br>

> <br>

> <br>

> <br>

> and then use if (V->valid_GPU_array & PETSC_OFFLOAD_GPU). What do you think?<br>

> <br>

> <br>

> <br>

> --Junchao Zhang<br>

<br>

</blockquote>

</div>

</body>

</html>