<div dir="ltr"><div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>I am thinking something like MatSeqAIJGetArrayAndMemType</div></blockquote><div><br></div><div>Isn't the "MemType" of the matrix an invariant on creation? e.g. a user shouldn't care what memtype a pointer is for, just that if a device matrix was created it returns device pointers, if a host matrix was created it returns host pointers.</div><div><br></div><div>Now that I'm looking at those docs I see <a href="https://petsc.org/release/docs/manualpages/Mat/MatSeqAIJGetCSRAndMemType/">MatSeqAIJGetCSRAndMemType</a>, isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return me those raw device pointers?<br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry <<a href="mailto:mlohry@gmail.com" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br><div>A workaround is to let petsc build the matrix and allocate the
memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and
fill it up.</div></blockquote><div><br></div><div>Junchao, looking at the code for this it seems to only return a pointer to the value array, but not pointers to the column and row index arrays, is that right?</div></div></blockquote><div>Yes, that is correct. </div>I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype), which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if A is a device matrix, otherwise (a,i, j) on host and mtype = PETSC_MEMTYPE_HOST.<br>We currently have similar things like VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).</div><div class="gmail_quote"><br></div><div class="gmail_quote">It looks like you need (a, i, j) for assembly, but the above function only works for an assembled matrix. <div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div> <br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch <<a href="mailto:jacob.fai@gmail.com" target="_blank">jacob.fai@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote">We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both </div></div></div></blockquote><div><br></div><div>CUPM works with both enabled simultaneously, I don’t think there are any direct restrictions for it. Vec at least was fully usable with both cuda and hip (though untested) last time I checked.</div><div> </div><div dir="ltr"><span style="background-color:rgba(255,255,255,0)">Best regards,<br><br>Jacob Faibussowitsch<br>(Jacob Fai - booss - oh - vitch)</span></div><div dir="ltr"><br><blockquote type="cite">On Jan 5, 2023, at 00:09, Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry <<a href="mailto:mlohry@gmail.com" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div>Oh, is the device backend not known at compile time? </div></div></blockquote><div>Currently it is known at compile time.</div></div></div></blockquote><div><br></div><div>Are you sure? I don't think it is known at compile time.</div></div></div></blockquote><div>We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div>Or multiple backends can be alive at once?<br></div></div></blockquote><div><br></div><div>Some petsc developers (Jed and Barry) want to support this, but we are incapable now.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry <<a href="mailto:mlohry@gmail.com" rel="noreferrer" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...</blockquote><div><br></div><div>Wouldn't one function suffice? Assuming these are contiguous arrays in CSR format, they're just raw device pointers in all cases.<br></div></div></blockquote><div>But we need to know what device it is (to dispatch to either petsc-CUDA or petsc-HIP backend)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" rel="noreferrer" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...<div><br></div><div>The real problem I think is to deal with multiple MPI ranks. Providing the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for users to do so.<br><div><br><div>A workaround is to let petsc build the matrix and allocate the memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.<br><br>We recently added routines to support matrix assembly on GPUs, see if<font face="Menlo, Monaco, Courier New, monospace" color="#795e26"><span style="font-size:14px;white-space:pre-wrap"> </span></font><span style="font-family:-apple-system,"system-ui","Segoe UI","Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol""><a href="https://petsc.org/release/docs/manualpages/Mat/MatSetValuesCOO/" rel="noreferrer" target="_blank">MatSetValuesCOO</a> helps</span><br></div><div><div style="color:rgb(0,0,0);font-family:Menlo,Monaco,"Courier New",monospace;font-size:14px;line-height:21px;white-space:pre-wrap"><div><br></div></div><div><div dir="ltr"><div dir="ltr">--Junchao Zhang</div></div></div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry <<a href="mailto:mlohry@gmail.com" rel="noreferrer" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I have a sparse matrix constructed in non-petsc code using a standard CSR representation where I compute the Jacobian to be used in an implicit TS context. In the CPU world I call</div><div><br></div><div>MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, colidxptr, valptr, Jac);</div><div><br></div><div>which as I understand it -- (1) never copies/allocates that information, and the matrix Jac is just a non-owning view into the already allocated CSR, (2) I can write directly into the original data structures and the Mat just "knows" about it, although it still needs a call to MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this works great with GAMG.<br></div><div><br></div><div>I have the same CSR representation filled in GPU data allocated with cudaMalloc and filled on-device. Is there an equivalent Mat constructor for GPU arrays, or some other way to avoid unnecessary copies?</div><div><br></div><div>Thanks,</div><div>Mark<br></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div></div></div>
</blockquote></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div></div>
</div></blockquote></div></blockquote></div>
</blockquote></div></div>
</blockquote></div>