<div dir="ltr"><div dir="ltr"><br><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry <<a href="mailto:mlohry@gmail.com" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div>Oh, is the device backend not known at compile time? </div></div></blockquote><div>Currently it is known at compile time.</div></div></div></blockquote><div><br></div><div>Are you sure? I don't think it is known at compile time.</div></div></div></blockquote><div>We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><div>  Thanks,</div><div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div>Or multiple backends can be alive at once?<br></div></div></blockquote><div><br></div><div>Some petsc developers (Jed and Barry) want to support this, but we are incapable now.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry <<a href="mailto:mlohry@gmail.com" rel="noreferrer" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...</blockquote><div><br></div><div>Wouldn't one function suffice? Assuming these are contiguous arrays in CSR format, they're just raw device pointers in all cases.<br></div></div></blockquote><div>But we need to know what device it is (to dispatch to either petsc-CUDA or petsc-HIP backend)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" rel="noreferrer" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...<div><br></div><div>The real problem I think is to deal with multiple MPI ranks. Providing the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for users to do so.<br><div><br><div>A workaround is to let petsc build the matrix and allocate the memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.<br><br>We recently added routines to support matrix assembly on GPUs, see if<font face="Menlo, Monaco, Courier New, monospace" color="#795e26"><span style="font-size:14px;white-space:pre-wrap"> </span></font><span style="font-family:-apple-system,"system-ui","Segoe UI","Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol""><a href="https://petsc.org/release/docs/manualpages/Mat/MatSetValuesCOO/" rel="noreferrer" target="_blank">MatSetValuesCOO</a> helps</span><br></div><div><div style="color:rgb(0,0,0);font-family:Menlo,Monaco,"Courier New",monospace;font-size:14px;line-height:21px;white-space:pre-wrap"><div><br></div></div><div><div dir="ltr"><div dir="ltr">--Junchao Zhang</div></div></div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry <<a href="mailto:mlohry@gmail.com" rel="noreferrer" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I have a sparse matrix constructed in non-petsc code using a standard CSR representation where I compute the Jacobian to be used in an implicit TS context. In the CPU world I call</div><div><br></div><div>MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, colidxptr, valptr, Jac);</div><div><br></div><div>which as I understand it -- (1) never copies/allocates that information, and the matrix Jac is just a non-owning view into the already allocated CSR, (2) I can write directly into the original data structures and the Mat just "knows" about it, although it still needs a call to MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this works great with GAMG.<br></div><div><br></div><div>I have the same CSR representation filled in GPU data allocated with cudaMalloc and filled on-device. Is there an equivalent Mat constructor for GPU arrays, or some other way to avoid unnecessary copies?</div><div><br></div><div>Thanks,</div><div>Mark<br></div></div>

</blockquote></div>

</blockquote></div>

</blockquote></div></div>

</blockquote></div></div></div>

</blockquote></div></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>

</blockquote></div></div>