<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><blockquote type="cite" class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">I don't think this is right. We want the device initialized by PETSc , we just don't want the cublas and cusolve stuff initialized. In order to see how much memory initializing the blas and solvers takes.</div></blockquote><div class=""><br class=""></div>This is how it has always been, PetscDevice adopted the same initialization strategy of the previous code. Eager initialization initializes __everything__ since initializing cublas and cusolver takes (or at least historically took) eons.<div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div class=""> So I think you need to comment things in cupminterface.hpp like cublasCreate and cusolverDnCreate.</div></div></blockquote><blockquote type="cite" class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div class=""><br class=""></div><div class=""> Urgh, I hate C++ where huge chunks of real code are in header files.</div></div></blockquote><br class=""></div><div class="">Not quite, you will need to comment out</div><div class=""><br class=""></div><div class="">ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr);</div><div class=""><br class=""></div><div class="">In src/sys/objects/device/impls/cupm/cupmcontext.hpp:202</div><div class=""><br class=""></div><div class=""><div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div>Best regards,<br class=""><br class="">Jacob Faibussowitsch<br class="">(Jacob Fai - booss - oh - vitch)<br class=""></div></div></div>
</div>
<div><br class=""><blockquote type="cite" class=""><div class="">On Jan 7, 2022, at 11:53, Barry Smith <<a href="mailto:bsmith@petsc.dev" class="">bsmith@petsc.dev</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><br class=""></div> I don't think this is right. We want the device initialized by PETSc , we just don't want the cublas and cusolve stuff initialized. In order to see how much memory initializing the blas and solvers takes.<div class=""><br class=""></div><div class=""> So I think you need to comment things in cupminterface.hpp like cublasCreate and cusolverDnCreate.</div><div class=""><br class=""></div><div class=""> Urgh, I hate C++ where huge chunks of real code are in header files.</div><div class=""><br class=""></div><div class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <<a href="mailto:jacob.fai@gmail.com" class="">jacob.fai@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hit send too early…<div class=""><br class=""></div><div class="">If you don’t want to comment out, you can also run with "-device_enable lazy" option. Normally this is the default behavior but if -log_view or -log_summary is provided this defaults to “-device_enable eager”. See src/sys/objects/device/interface/device.cxx:398</div><div class=""><br class=""></div><div class=""><div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">Best regards,<br class=""><br class="">Jacob Faibussowitsch<br class="">(Jacob Fai - booss - oh - vitch)<br class=""></div></div></div>
</div>
<div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <<a href="mailto:jacob.fai@gmail.com" class="">jacob.fai@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=us-ascii" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><blockquote type="cite" class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span class="" style="color: rgb(29, 28, 29); font-family: Slack-Lato, appleLogo, sans-serif; font-size: 15px; font-variant-ligatures: common-ligatures; orphans: 2; widows: 2; background-color: rgb(248, 248, 248); text-decoration-thickness: initial;">You need to go into the PetscInitialize() routine find where it loads the cublas and cusolve and comment out those lines then run with -log_view</span></div></blockquote><div class=""><br class=""></div>Comment out<div class=""><div class=""><br class=""></div><div class="">#if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || PetscDefined(HAVE_SYCL))</div><div class=""> ierr = PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr);</div><div class="">#endif</div><div class=""><br class=""></div><div class="">At <span style="caret-color: rgb(0, 0, 0);" class="">src/sys/objects/pinit.c:956</span></div><div class=""><br class=""></div><div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">Best regards,<br class=""><br class="">Jacob Faibussowitsch<br class="">(Jacob Fai - booss - oh - vitch)<br class=""></div></div></div>
</div>
<div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 7, 2022, at 11:24, Barry Smith <<a href="mailto:bsmith@petsc.dev" class="">bsmith@petsc.dev</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=us-ascii" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><br class=""></div><span style="color: rgb(29, 28, 29); font-family: Slack-Lato, appleLogo, sans-serif; font-size: 15px; font-variant-ligatures: common-ligatures; orphans: 2; widows: 2; background-color: rgb(248, 248, 248); text-decoration-thickness: initial;" class="">Without log_view it does not load any cuBLAS/cuSolve immediately with -log_view it loads all that stuff at startup. You need to go into the PetscInitialize() routine find where it loads the cublas and cusolve and comment out those lines then run with -log_view</span><div class=""><div style="orphans: 2; widows: 2;" class=""><font color="#1d1c1d" face="Slack-Lato, appleLogo, sans-serif" class=""><span style="caret-color: rgb(29, 28, 29); font-size: 15px; background-color: rgb(248, 248, 248);" class=""><br class=""></span></font></div><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" class="">petsc-dev@mcs.anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii" class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<span class="">When PETSc is initialized, it takes about 2GB CUDA memory. This is way too much for doing nothing. A test script is attached to reproduce the issue. If I remove the first line "import torch", PETSc consumes about 0.73GB, which is still significant.
Does anyone have any idea about this behavior?</span>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Hong<br class="">
<div class=""><br class="">
</div>
<div class="">
<pre class="c-mrkdwn__pre" data-stringify-type="pre" style="box-sizing: inherit; margin-top: 4px; margin-bottom: 4px; padding: 8px; --saf-0: rgba(var(--sk_foreground_low,29,28,29),0.13); font-size: 12px; line-height: 1.50001; font-variant-ligatures: none; white-space: pre-wrap; overflow-wrap: break-word; word-break: normal; tab-size: 4; border: 1px solid var(--saf-0); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background: rgba(var(--sk_foreground_min,29,28,29),0.04); counter-reset: list-0 0 list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; color: rgb(29, 28, 29); orphans: 2; widows: 2; text-decoration-thickness: initial; font-family: Monaco, Menlo, Consolas, "Courier New", monospace !important;">hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples (caidao22/update-examples)$ python3 test.py
CUDA memory before PETSc 0.000GB
CUDA memory after PETSc 0.004GB
hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples (caidao22/update-examples)$ python3 test.py -log_view :0.txt
CUDA memory before PETSc 0.000GB
CUDA memory after PETSc 1.936GB</pre>
<div class=""><br class="">
</div>
</div>
<div class="">
<pre class="c-mrkdwn__pre" data-stringify-type="pre" style="box-sizing: inherit; margin-top: 4px; margin-bottom: 4px; padding: 8px; --saf-0: rgba(var(--sk_foreground_low,29,28,29),0.13); font-size: 12px; line-height: 1.50001; font-variant-ligatures: none; white-space: pre-wrap; overflow-wrap: break-word; word-break: normal; tab-size: 4; border: 1px solid var(--saf-0); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background: rgba(var(--sk_foreground_min,29,28,29),0.04); counter-reset: list-0 0 list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; color: rgb(29, 28, 29); orphans: 2; widows: 2; text-decoration-thickness: initial; font-family: Monaco, Menlo, Consolas, "Courier New", monospace !important;">import torch
import sys
import os
import nvidia_smi
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print('CUDA memory before PETSc %.3fGB' % (info.used/1e9))
petsc4py_path = os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib')
sys.path.append(petsc4py_path)
import petsc4py
petsc4py.init(sys.argv)
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print('CUDA memory after PETSc %.3fGB' % (info.used/1e9))</pre>
<div class=""><br class="">
</div>
</div>
</div>
</div>
</div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></body></html>