<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Note that the overhead was triggered by the first call to a CUDA function. So it seems that the first CUDA function triggered loading petsc so (if petsc so is linked), which is slow on the summit file system.
<div class=""><br class="">
<div class="">
<div class="">
<div class="">Hong
<div class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Feb 7, 2020, at 2:54 PM, Zhang, Hong via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" class="">petsc-dev@mcs.anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div class="">Linking any other shared library does not slow down the execution. The PETSc shared library is the only one causing trouble.</div>
<div class=""><br class="">
</div>
<div class="">Here are the ldd output for two different versions. For the first version, I removed -lpetsc and it ran very fast. The second (slow) version was linked to petsc so. </div>
<div class=""><br class="">
</div>
<div class="">bash-4.2$ ldd ex_simple</div>
<div class="">        linux-vdso64.so.1 =>  (0x0000200000050000)</div>
<div class="">        liblapack.so.0 => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0 (0x0000200000070000)</div>
<div class="">        libblas.so.0 => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0 (0x00002000009b0000)</div>
<div class="">        libhdf5hl_fortran.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100 (0x0000200000e80000)</div>
<div class="">        libhdf5_fortran.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100 (0x0000200000ed0000)</div>
<div class="">        libhdf5_hl.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100 (0x0000200000f50000)</div>
<div class="">        libhdf5.so.103 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103 (0x0000200000fb0000)</div>
<div class="">        libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002000015e0000)</div>
<div class="">        libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 (0x0000200001770000)</div>
<div class="">        libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 (0x0000200009b00000)</div>
<div class="">        libcudart.so.10.1 => /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 (0x000020000d950000)</div>
<div class="">        libcusparse.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 (0x000020000d9f0000)</div>
<div class="">        libcusolver.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 (0x0000200012f50000)</div>
<div class="">        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000020001dc40000)</div>
<div class="">        libdl.so.2 => /usr/lib64/libdl.so.2 (0x000020001ddd0000)</div>
<div class="">        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x000020001de00000)</div>
<div class="">        libmpiprofilesupport.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3 (0x000020001de40000)</div>
<div class="">        libmpi_ibm_usempi.so => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so (0x000020001de70000)</div>
<div class="">        libmpi_ibm_mpifh.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3 (0x000020001dea0000)</div>
<div class="">        libmpi_ibm.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3 (0x000020001df40000)</div>
<div class="">        libpgf90rtl.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so (0x000020001e0b0000)</div>
<div class="">        libpgf90.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so (0x000020001e0f0000)</div>
<div class="">        libpgf90_rpm1.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so (0x000020001e6a0000)</div>
<div class="">        libpgf902.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so (0x000020001e6d0000)</div>
<div class="">        libpgftnrtl.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so (0x000020001e700000)</div>
<div class="">        libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x000020001e730000)</div>
<div class="">        libpgkomp.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so (0x000020001e760000)</div>
<div class="">        libomp.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so (0x000020001e790000)</div>
<div class="">        libomptarget.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so (0x000020001e880000)</div>
<div class="">        libpgmath.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so (0x000020001e8b0000)</div>
<div class="">        libpgc.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so (0x000020001e9d0000)</div>
<div class="">        librt.so.1 => /usr/lib64/librt.so.1 (0x000020001eb40000)</div>
<div class="">        libm.so.6 => /usr/lib64/libm.so.6 (0x000020001eb70000)</div>
<div class="">        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x000020001ec60000)</div>
<div class="">        libc.so.6 => /usr/lib64/libc.so.6 (0x000020001eca0000)</div>
<div class="">        libz.so.1 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1 (0x000020001ee90000)</div>
<div class="">        libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x000020001eef0000)</div>
<div class="">        /lib64/ld64.so.2 (0x0000200000000000)</div>
<div class="">        libcublasLt.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 (0x000020001ef40000)</div>
<div class="">        libutil.so.1 => /usr/lib64/libutil.so.1 (0x0000200020e50000)</div>
<div class="">        libhwloc_ompi.so.15 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15 (0x0000200020e80000)</div>
<div class="">        libevent-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6 (0x0000200020ef0000)</div>
<div class="">        libevent_pthreads-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6 (0x0000200020f70000)</div>
<div class="">        libopen-rte.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3 (0x0000200020fa0000)</div>
<div class="">        libopen-pal.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3 (0x00002000210b0000)</div>
<div class="">        libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002000211a0000)</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">bash-4.2$ ldd ex_simple_slow</div>
<div class="">        linux-vdso64.so.1 =>  (0x0000200000050000)</div>
<div class=""><font color="#ff2600" class="">        libpetsc.so.3.012 => /autofs/nccs-svm1_home1/hongzh/Projects/petsc/arch-olcf-summit-sell-opt/lib/libpetsc.so.3.012 (0x0000200000070000)</font></div>
<div class="">        liblapack.so.0 => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0 (0x0000200002be0000)</div>
<div class="">        libblas.so.0 => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0 (0x0000200003520000)</div>
<div class="">        libhdf5hl_fortran.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100 (0x00002000039f0000)</div>
<div class="">        libhdf5_fortran.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100 (0x0000200003a40000)</div>
<div class="">        libhdf5_hl.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100 (0x0000200003ac0000)</div>
<div class="">        libhdf5.so.103 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103 (0x0000200003b20000)</div>
<div class="">        libX11.so.6 => /usr/lib64/libX11.so.6 (0x0000200004150000)</div>
<div class="">        libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 (0x00002000042e0000)</div>
<div class="">        libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 (0x000020000c670000)</div>
<div class="">        libcudart.so.10.1 => /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 (0x00002000104c0000)</div>
<div class="">        libcusparse.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 (0x0000200010560000)</div>
<div class="">        libcusolver.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 (0x0000200015ac0000)</div>
<div class="">        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002000207b0000)</div>
<div class="">        libdl.so.2 => /usr/lib64/libdl.so.2 (0x0000200020940000)</div>
<div class="">        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x0000200020970000)</div>
<div class="">        libmpiprofilesupport.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3 (0x00002000209b0000)</div>
<div class="">        libmpi_ibm_usempi.so => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so (0x00002000209e0000)</div>
<div class="">        libmpi_ibm_mpifh.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3 (0x0000200020a10000)</div>
<div class="">        libmpi_ibm.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3 (0x0000200020ab0000)</div>
<div class="">        libpgf90rtl.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so (0x0000200020c20000)</div>
<div class="">        libpgf90.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so (0x0000200020c60000)</div>
<div class="">        libpgf90_rpm1.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so (0x0000200021210000)</div>
<div class="">        libpgf902.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so (0x0000200021240000)</div>
<div class="">        libpgftnrtl.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so (0x0000200021270000)</div>
<div class="">        libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x00002000212a0000)</div>
<div class="">        libpgkomp.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so (0x00002000212d0000)</div>
<div class="">        libomp.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so (0x0000200021300000)</div>
<div class="">        libomptarget.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so (0x00002000213f0000)</div>
<div class="">        libpgmath.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so (0x0000200021420000)</div>
<div class="">        libpgc.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so (0x0000200021540000)</div>
<div class="">        librt.so.1 => /usr/lib64/librt.so.1 (0x00002000216b0000)</div>
<div class="">        libm.so.6 => /usr/lib64/libm.so.6 (0x00002000216e0000)</div>
<div class="">        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00002000217d0000)</div>
<div class="">        libc.so.6 => /usr/lib64/libc.so.6 (0x0000200021810000)</div>
<div class="">        libz.so.1 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1 (0x0000200021a10000)</div>
<div class="">        libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x0000200021a60000)</div>
<div class="">        /lib64/ld64.so.2 (0x0000200000000000)</div>
<div class="">        libcublasLt.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 (0x0000200021ab0000)</div>
<div class="">        libutil.so.1 => /usr/lib64/libutil.so.1 (0x00002000239c0000)</div>
<div class="">        libhwloc_ompi.so.15 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15 (0x00002000239f0000)</div>
<div class="">        libevent-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6 (0x0000200023a60000)</div>
<div class="">        libevent_pthreads-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6 (0x0000200023ae0000)</div>
<div class="">        libopen-rte.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3 (0x0000200023b10000)</div>
<div class="">        libopen-pal.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3 (0x0000200023c20000)</div>
<div class="">        libXau.so.6 => /usr/lib64/libXau.so.6 (0x0000200023d10000)</div>
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Feb 7, 2020, at 2:31 PM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" class="">bsmith@mcs.anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class=""><br class="">
 ldd -o on the executable of both linkings of your code.<br class="">
<br class="">
 My guess is that without PETSc it is linking the static version of the needed libraries and with PETSc the shared. And, in typical fashion, the shared libraries are off on some super slow file system so take a long time to be loaded and linked in on demand.<br class="">
<br class="">
  Still a performance bug in Summit. <br class="">
<br class="">
  Barry<br class="">
<br class="">
<br class="">
<blockquote type="cite" class="">On Feb 7, 2020, at 12:23 PM, Zhang, Hong via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" class="">petsc-dev@mcs.anl.gov</a>> wrote:<br class="">
<br class="">
Hi all,<br class="">
<br class="">
Previously I have noticed that the first call to a CUDA function such as cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. Then I prepared a simple example as attached to help OCLF reproduce the problem. It turned out that the problem
 was  caused by PETSc. The 7.5-second overhead can be observed only when the PETSc lib is linked. If I do not link PETSc, it runs normally. Does anyone have any idea why this happens and how to fix it?<br class="">
<br class="">
Hong (Mr.)<br class="">
<br class="">
bash-4.2$ cat ex_simple.c<br class="">
#include <time.h><br class="">
#include <cuda_runtime.h><br class="">
#include <stdio.h><br class="">
<br class="">
int main(int argc,char **args)<br class="">
{<br class="">
clock_t start,s1,s2,s3;<br class="">
double  cputime;<br class="">
double   *init,tmp[100] = {0};<br class="">
<br class="">
start = clock();<br class="">
cudaFree(0);<br class="">
s1 = clock();<br class="">
cudaMalloc((void **)&init,100*sizeof(double));<br class="">
s2 = clock();<br class="">
cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);<br class="">
s3 = clock();<br class="">
printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 - start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) (s3 - s2)) / CLOCKS_PER_SEC);<br class="">
<br class="">
return 0;<br class="">
}<br class="">
<br class="">
<br class="">
</blockquote>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</div>
</div>
</body>
</html>