<div dir="auto">If you use cudaMallocManaged with host affinity, you can drop that into PETSc malloc and it should “just work” including migrating to GPU when touched. Or you can give it device affinity and it will migrate the other way when the CPU touches it. </div><div dir="auto"><br></div><div dir="auto">This is way more performance portable that system managed memory on the Summit/Lassen systems, which can do unpleasant things unless you disable NUMA balancing and use CUDA prefetch. </div><div dir="auto"><br></div><div dir="auto">Jeff</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 2, 2020 at 10:49 AM Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)"><div dir="ltr">OK good to know. I will now worry even less about making this very complete.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 2, 2020 at 1:33 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)"><br><br><br>  Mark,<br><br><br><br><br><br>   Currently you use directly the Nvidia provided mallocs cudaMalloc for all mallocs on the GPU. See for example <a href="http://aijcusparse.cu" rel="noreferrer" target="_blank">aijcusparse.cu</a>. <br><br><br><br><br><br>   I will be using Stefano's work to start developing a unified PETSc based system for all memory management but don't wait for that.<br><br><br><br><br><br>   Barry<br><br><br><br><br><br><br><br><br>> On Sep 2, 2020, at 8:58 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br><br><br>> <br><br><br>> PETSc mallocs seem to boil down to PetscMallocAlign. There are switches in here but I don't see a Cuda malloc. THis would seem to be convenient if I want to create an Object entirely on Cuda or any device. <br><br><br>> <br><br><br>> Are there any thoughts along these lines or should I just duplicate Mat creation, for instance, by hand?<br><br><br><br><br><br></blockquote></div><br><br></blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>