<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div class="">Commenting out the block containing PetscDeviceContextXXX reduces the memory cost from 1.9GB to 1.5GB.</div>
<div class="">Commenting out  PetscDeviceInitializeTypeFromOptions_Private(0 reduces it to 0GB.</div>
<div class=""><br class="">
</div>
<div class="">diff --git a/src/sys/objects/device/interface/device.cxx b/src/sys/objects/device/interface/device.cxx</div>
<div class="">index a682f16b696..1b2c7210dfe 100644</div>
<div class="">--- a/src/sys/objects/device/interface/device.cxx</div>
<div class="">+++ b/src/sys/objects/device/interface/device.cxx</div>
<div class="">@@ -422,7 +422,7 @@ PetscErrorCode PetscDeviceInitializeFromOptions_Internal(MPI_Comm comm)</div>
<div class="">     const auto deviceType = static_cast<PetscDeviceType>(i);</div>
<div class="">     auto initType         = defaultInitType;</div>
<div class=""><br class="">
</div>
<div class="">-    ierr = PetscDeviceInitializeTypeFromOptions_Private(comm,deviceType,defaultDevice,defaultView,&initType);CHKERRQ(ierr);</div>
<div class="">+    //ierr = PetscDeviceInitializeTypeFromOptions_Private(comm,deviceType,defaultDevice,defaultView,&initType);CHKERRQ(ierr);</div>
<div class="">     if (PetscDeviceConfiguredFor_Internal(deviceType) && (initType == PETSC_DEVICE_INIT_EAGER)) {</div>
<div class="">       initializeDeviceContextEagerly = PETSC_TRUE;</div>
<div class="">       deviceContextInitDevice        = deviceType;</div>
<div class="">@@ -433,11 +433,13 @@ PetscErrorCode PetscDeviceInitializeFromOptions_Internal(MPI_Comm comm)</div>
<div class=""><br class="">
</div>
<div class="">     /* somewhat inefficient here as the device context is potentially fully set up twice (once</div>
<div class="">      * when retrieved then the second time if setfromoptions makes changes) */</div>
<div class="">+    /*</div>
<div class="">     ierr = PetscInfo1(PETSC_NULLPTR,"Eagerly initializing PetscDeviceContext with %s device\n",PetscDeviceTypes[deviceContextInitDevice]);CHKERRQ(ierr);</div>
<div class="">     ierr = PetscDeviceContextSetRootDeviceType_Internal(deviceContextInitDevice);CHKERRQ(ierr);</div>
<div class="">     ierr = PetscDeviceContextGetCurrentContext(&dctx);CHKERRQ(ierr);</div>
<div class="">     ierr = PetscDeviceContextSetFromOptions(comm,"root_",dctx);CHKERRQ(ierr);</div>
<div class="">     ierr = PetscDeviceContextSetUp(dctx);CHKERRQ(ierr);</div>
<div class="">+    */</div>
<div class="">   }</div>
<div class="">   PetscFunctionReturn(0);</div>
<div class=""> }</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Jan 7, 2022, at 10:24 AM, Barry Smith <<a href="mailto:bsmith@petsc.dev" class="">bsmith@petsc.dev</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div class=""><br class="">
</div>
<span style="color: rgb(29, 28, 29); font-family: Slack-Lato, appleLogo, sans-serif; font-size: 15px; font-variant-ligatures: common-ligatures; orphans: 2; widows: 2; background-color: rgb(248, 248, 248); text-decoration-thickness: initial;" class="">Without
 log_view it does not load any cuBLAS/cuSolve immediately with -log_view it loads all that stuff at startup. You need to go into the PetscInitialize() routine find where it loads the cublas and cusolve and comment out those lines then run with -log_view</span>
<div class="">
<div style="orphans: 2; widows: 2;" class=""><font color="#1d1c1d" face="Slack-Lato, appleLogo, sans-serif" class=""><span style="caret-color: rgb(29, 28, 29); font-size: 15px; background-color: rgb(248, 248, 248);" class=""><br class="">
</span></font></div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" class="">petsc-dev@mcs.anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<span class="">When PETSc is initialized, it takes about 2GB CUDA memory. This is way too much for doing nothing. A test script is attached to reproduce the issue. If I remove the first line "import torch", PETSc consumes about 0.73GB, which is still significant.
 Does anyone have any idea about this behavior?</span>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Hong<br class="">
<div class=""><br class="">
</div>
<div class="">
<pre class="c-mrkdwn__pre" data-stringify-type="pre" style="box-sizing: inherit; margin-top: 4px; margin-bottom: 4px; padding: 8px; --saf-0: rgba(var(--sk_foreground_low,29,28,29),0.13); font-size: 12px; line-height: 1.50001; font-variant-ligatures: none; white-space: pre-wrap; overflow-wrap: break-word; word-break: normal; tab-size: 4; border: 1px solid var(--saf-0); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background: rgba(var(--sk_foreground_min,29,28,29),0.04); counter-reset: list-0 0 list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; color: rgb(29, 28, 29); orphans: 2; widows: 2; text-decoration-thickness: initial; font-family: Monaco, Menlo, Consolas, "Courier New", monospace !important;">hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples (caidao22/update-examples)$ python3 test.py
CUDA memory before PETSc 0.000GB
CUDA memory after PETSc 0.004GB
hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples (caidao22/update-examples)$ python3 test.py -log_view :0.txt
CUDA memory before PETSc 0.000GB
CUDA memory after PETSc 1.936GB</pre>
<div class=""><br class="">
</div>
</div>
<div class="">
<pre class="c-mrkdwn__pre" data-stringify-type="pre" style="box-sizing: inherit; margin-top: 4px; margin-bottom: 4px; padding: 8px; --saf-0: rgba(var(--sk_foreground_low,29,28,29),0.13); font-size: 12px; line-height: 1.50001; font-variant-ligatures: none; white-space: pre-wrap; overflow-wrap: break-word; word-break: normal; tab-size: 4; border: 1px solid var(--saf-0); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background: rgba(var(--sk_foreground_min,29,28,29),0.04); counter-reset: list-0 0 list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; color: rgb(29, 28, 29); orphans: 2; widows: 2; text-decoration-thickness: initial; font-family: Monaco, Menlo, Consolas, "Courier New", monospace !important;">import torch
import sys
import os

import nvidia_smi
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print('CUDA memory before PETSc %.3fGB' % (info.used/1e9))

petsc4py_path = os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib')
sys.path.append(petsc4py_path)
import petsc4py
petsc4py.init(sys.argv)
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print('CUDA memory after PETSc %.3fGB' % (info.used/1e9))</pre>
<div class=""><br class="">
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</body>
</html>