<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Dec 13, 2020 at 4:34 PM Barry Smith <<a href="mailto:bsmith@petsc.dev">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><br><div><br><blockquote type="cite"><div>On Dec 13, 2020, at 8:39 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:</div><br><div><div dir="ltr">So how do I declare <div><br></div><div>struct myMat {<div> PetscScalar *a;</div><div> PetscInt        *i, *j, nrows;</div><div>}</div><div><br></div><div>so that I can use, or give me an alternative idiom, </div><div><br></div><div>struct myMat h_mat, *d_mat;<br></div><div>....</div><div>cudaMemcpy(          d_mat,   &h_mat,  sizeof(myMat),  cudaMemcpyHostToDevice))<br></div></div></div></div></blockquote><div><br></div>   This will copy the pointers (not the entries in the arrays) which means in both d_mat and h_mat a and i,j refer to Kokkos (or CUDA etc) memory so </div></div></blockquote><div><br></div><div>Yes. Note, h_mat is, in my usage, a temporary container. It is a mix of host and device memory and is just there to stuff device pointers and raw (meta) data, to be copied whole onto the device.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><br></div><div><blockquote type="cite"><div dir="ltr"><div>struct myMat {<div> PetscKokkosScalar *a;</div><div> PetscKokkosInt        *i, *j;</div></div></div></blockquote>   PetscInt                    nrows;<blockquote type="cite"><div dir="ltr"><div><div>}</div></div></div></blockquote><br></div><div>   the nrows is annoying because in d_mat it refers to device memory while in h_mat it refers to host memory so we cannot label it clearly.</div></div></blockquote><div><br></div><div>Kokkos wraps all this ugliness, but in raw Cuda I do it this manual way. It's low level and you have to keep track in your head (names help) what is going on. But this mixed h_mat is just around in a local scope. A better name for it might be h_temp_mat.</div><div><br></div><div>I think your PetscKokkosScalar is a Scalar in the default execution's memory space.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><br></div><div>   Sorry, by code reader I meant someone reading or maintaining the code. </div><div><br></div><div>   To me the above struct is clearer, I know immediately that *a and *i, *j point to Kokkos space. While labeled as PetscScalar *a I have to figure out somehow by examining other code where it points (this is a waste of time I think). </div></div></blockquote><div><br></div><div>I think Junchao is doing something like this in aijkokkosimpl.hpp, as I said. He and I use variable names to make it clear where they are (eg, d_mat for a Mat on the device). Junchoa had to use Kokkos data type to talk to Kokkos, but we both use naming conventions.</div><div><br></div><div>Here is a whole method that does this idiom. I have a Mat with a "data" pointer (this is a bad name but this was a void* at first) to its device "self".  I'm not sure how your suggestion would impact this.</div><div><br></div><div>PETSC_EXTERN PetscErrorCode LandauCUDACreateMatMaps(P4estVertexMaps *maps, pointInterpolationP4est (*points)[LANDAU_MAX_Q_FACE], PetscInt Nf, PetscInt Nq)<br>{<br>  P4estVertexMaps h_maps;<br>  PetscFunctionBegin;<br>  h_maps.num_elements =maps->num_elements;<br>  h_maps.num_face = maps->num_face;<br>  h_maps.num_reduced = maps->num_reduced;<br>  h_maps.deviceType = maps->deviceType;<br>  h_maps.Nf = Nf;<br>  h_maps.Nq = Nq;<br>  CUDA_SAFE_CALL(cudaMalloc((void **)&h_maps.c_maps,               maps->num_reduced  * sizeof *points));<br>  CUDA_SAFE_CALL(cudaMemcpy(          h_maps.c_maps, maps->c_maps, maps->num_reduced  * sizeof *points, cudaMemcpyHostToDevice));<br>  CUDA_SAFE_CALL(cudaMalloc((void **)&h_maps.gIdx,                 maps->num_elements * sizeof *maps->gIdx));<br>  CUDA_SAFE_CALL(cudaMemcpy(          h_maps.gIdx, maps->gIdx,     maps->num_elements * sizeof *maps->gIdx, cudaMemcpyHostToDevice));<br>  CUDA_SAFE_CALL(cudaMalloc((void **)&maps->data, sizeof(P4estVertexMaps)));<br>  CUDA_SAFE_CALL(cudaMemcpy(          maps->data,   &h_maps, sizeof(P4estVertexMaps), cudaMemcpyHostToDevice));<br>  PetscFunctionReturn(0);<br>}<br></div><div><br></div><div>Mark</div></div></div>