<div><div dir="auto">Given that OLCF filesystems are the issue, have you engaged their support personnel regarding this issue?</div></div><div dir="auto"><br></div><div dir="auto">Jeff</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 7, 2020 at 6:37 PM Junchao Zhang via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov">petsc-dev@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Have you tried passing -cuda_initialize 0 to petsc?</div><div dir="ltr"><br><div><div dir="ltr" data-smartmail="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 7, 2020 at 5:16 PM Zhang, Hong via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I tried to install PETSc shared library in /gpfs/alpine/scratch, which should be faster than the home directory. But the same overhead still persists.<br>
<br>
Hong<br>
<br>
> On Feb 7, 2020, at 4:32 PM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
> <br>
> <br>
> Perhaps the intent is that you build or install (--prefix) your libraries in a different place than /autofs/nccs-svm1_home1 <br>
> <br>
> <br>
> <br>
>> On Feb 7, 2020, at 3:09 PM, Zhang, Hong <<a href="mailto:hongzhang@anl.gov" target="_blank">hongzhang@anl.gov</a>> wrote:<br>
>> <br>
>> Note that the overhead was triggered by the first call to a CUDA function. So it seems that the first CUDA function triggered loading petsc so (if petsc so is linked), which is slow on the summit file system.<br>
>> <br>
>> Hong<br>
>> <br>
>>> On Feb 7, 2020, at 2:54 PM, Zhang, Hong via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>> wrote:<br>
>>> <br>
>>> Linking any other shared library does not slow down the execution. The PETSc shared library is the only one causing trouble.<br>
>>> <br>
>>> Here are the ldd output for two different versions. For the first version, I removed -lpetsc and it ran very fast. The second (slow) version was linked to petsc so. <br>
>>> <br>
>>> bash-4.2$ ldd ex_simple<br>
>>> linux-vdso64.so.1 => (0x0000200000050000)<br>
>>> liblapack.so.0 => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0 (0x0000200000070000)<br>
>>> libblas.so.0 => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0 (0x00002000009b0000)<br>
>>> libhdf5hl_fortran.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100 (0x0000200000e80000)<br>
>>> libhdf5_fortran.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100 (0x0000200000ed0000)<br>
>>> libhdf5_hl.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100 (0x0000200000f50000)<br>
>>> libhdf5.so.103 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103 (0x0000200000fb0000)<br>
>>> libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002000015e0000)<br>
>>> libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 (0x0000200001770000)<br>
>>> libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 (0x0000200009b00000)<br>
>>> libcudart.so.10.1 => /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 (0x000020000d950000)<br>
>>> libcusparse.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 (0x000020000d9f0000)<br>
>>> libcusolver.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 (0x0000200012f50000)<br>
>>> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000020001dc40000)<br>
>>> libdl.so.2 => /usr/lib64/libdl.so.2 (0x000020001ddd0000)<br>
>>> libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x000020001de00000)<br>
>>> libmpiprofilesupport.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3 (0x000020001de40000)<br>
>>> libmpi_ibm_usempi.so => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so (0x000020001de70000)<br>
>>> libmpi_ibm_mpifh.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3 (0x000020001dea0000)<br>
>>> libmpi_ibm.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3 (0x000020001df40000)<br>
>>> libpgf90rtl.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so (0x000020001e0b0000)<br>
>>> libpgf90.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so (0x000020001e0f0000)<br>
>>> libpgf90_rpm1.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so (0x000020001e6a0000)<br>
>>> libpgf902.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so (0x000020001e6d0000)<br>
>>> libpgftnrtl.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so (0x000020001e700000)<br>
>>> libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x000020001e730000)<br>
>>> libpgkomp.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so (0x000020001e760000)<br>
>>> libomp.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so (0x000020001e790000)<br>
>>> libomptarget.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so (0x000020001e880000)<br>
>>> libpgmath.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so (0x000020001e8b0000)<br>
>>> libpgc.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so (0x000020001e9d0000)<br>
>>> librt.so.1 => /usr/lib64/librt.so.1 (0x000020001eb40000)<br>
>>> libm.so.6 => /usr/lib64/libm.so.6 (0x000020001eb70000)<br>
>>> libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x000020001ec60000)<br>
>>> libc.so.6 => /usr/lib64/libc.so.6 (0x000020001eca0000)<br>
>>> libz.so.1 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1 (0x000020001ee90000)<br>
>>> libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x000020001eef0000)<br>
>>> /lib64/ld64.so.2 (0x0000200000000000)<br>
>>> libcublasLt.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 (0x000020001ef40000)<br>
>>> libutil.so.1 => /usr/lib64/libutil.so.1 (0x0000200020e50000)<br>
>>> libhwloc_ompi.so.15 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15 (0x0000200020e80000)<br>
>>> libevent-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6 (0x0000200020ef0000)<br>
>>> libevent_pthreads-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6 (0x0000200020f70000)<br>
>>> libopen-rte.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3 (0x0000200020fa0000)<br>
>>> libopen-pal.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3 (0x00002000210b0000)<br>
>>> libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002000211a0000)<br>
>>> <br>
>>> <br>
>>> bash-4.2$ ldd ex_simple_slow<br>
>>> linux-vdso64.so.1 => (0x0000200000050000)<br>
>>> libpetsc.so.3.012 => /autofs/nccs-svm1_home1/hongzh/Projects/petsc/arch-olcf-summit-sell-opt/lib/libpetsc.so.3.012 (0x0000200000070000)<br>
>>> liblapack.so.0 => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0 (0x0000200002be0000)<br>
>>> libblas.so.0 => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0 (0x0000200003520000)<br>
>>> libhdf5hl_fortran.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100 (0x00002000039f0000)<br>
>>> libhdf5_fortran.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100 (0x0000200003a40000)<br>
>>> libhdf5_hl.so.100 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100 (0x0000200003ac0000)<br>
>>> libhdf5.so.103 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103 (0x0000200003b20000)<br>
>>> libX11.so.6 => /usr/lib64/libX11.so.6 (0x0000200004150000)<br>
>>> libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 (0x00002000042e0000)<br>
>>> libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 (0x000020000c670000)<br>
>>> libcudart.so.10.1 => /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 (0x00002000104c0000)<br>
>>> libcusparse.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 (0x0000200010560000)<br>
>>> libcusolver.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 (0x0000200015ac0000)<br>
>>> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002000207b0000)<br>
>>> libdl.so.2 => /usr/lib64/libdl.so.2 (0x0000200020940000)<br>
>>> libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x0000200020970000)<br>
>>> libmpiprofilesupport.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3 (0x00002000209b0000)<br>
>>> libmpi_ibm_usempi.so => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so (0x00002000209e0000)<br>
>>> libmpi_ibm_mpifh.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3 (0x0000200020a10000)<br>
>>> libmpi_ibm.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3 (0x0000200020ab0000)<br>
>>> libpgf90rtl.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so (0x0000200020c20000)<br>
>>> libpgf90.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so (0x0000200020c60000)<br>
>>> libpgf90_rpm1.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so (0x0000200021210000)<br>
>>> libpgf902.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so (0x0000200021240000)<br>
>>> libpgftnrtl.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so (0x0000200021270000)<br>
>>> libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x00002000212a0000)<br>
>>> libpgkomp.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so (0x00002000212d0000)<br>
>>> libomp.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so (0x0000200021300000)<br>
>>> libomptarget.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so (0x00002000213f0000)<br>
>>> libpgmath.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so (0x0000200021420000)<br>
>>> libpgc.so => /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so (0x0000200021540000)<br>
>>> librt.so.1 => /usr/lib64/librt.so.1 (0x00002000216b0000)<br>
>>> libm.so.6 => /usr/lib64/libm.so.6 (0x00002000216e0000)<br>
>>> libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00002000217d0000)<br>
>>> libc.so.6 => /usr/lib64/libc.so.6 (0x0000200021810000)<br>
>>> libz.so.1 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1 (0x0000200021a10000)<br>
>>> libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x0000200021a60000)<br>
>>> /lib64/ld64.so.2 (0x0000200000000000)<br>
>>> libcublasLt.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 (0x0000200021ab0000)<br>
>>> libutil.so.1 => /usr/lib64/libutil.so.1 (0x00002000239c0000)<br>
>>> libhwloc_ompi.so.15 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15 (0x00002000239f0000)<br>
>>> libevent-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6 (0x0000200023a60000)<br>
>>> libevent_pthreads-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6 (0x0000200023ae0000)<br>
>>> libopen-rte.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3 (0x0000200023b10000)<br>
>>> libopen-pal.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3 (0x0000200023c20000)<br>
>>> libXau.so.6 => /usr/lib64/libXau.so.6 (0x0000200023d10000)<br>
>>> <br>
>>> <br>
>>>> On Feb 7, 2020, at 2:31 PM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
>>>> <br>
>>>> <br>
>>>> ldd -o on the executable of both linkings of your code.<br>
>>>> <br>
>>>> My guess is that without PETSc it is linking the static version of the needed libraries and with PETSc the shared. And, in typical fashion, the shared libraries are off on some super slow file system so take a long time to be loaded and linked in on demand.<br>
>>>> <br>
>>>> Still a performance bug in Summit. <br>
>>>> <br>
>>>> Barry<br>
>>>> <br>
>>>> <br>
>>>>> On Feb 7, 2020, at 12:23 PM, Zhang, Hong via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>> wrote:<br>
>>>>> <br>
>>>>> Hi all,<br>
>>>>> <br>
>>>>> Previously I have noticed that the first call to a CUDA function such as cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. Then I prepared a simple example as attached to help OCLF reproduce the problem. It turned out that the problem was caused by PETSc. The 7.5-second overhead can be observed only when the PETSc lib is linked. If I do not link PETSc, it runs normally. Does anyone have any idea why this happens and how to fix it?<br>
>>>>> <br>
>>>>> Hong (Mr.)<br>
>>>>> <br>
>>>>> bash-4.2$ cat ex_simple.c<br>
>>>>> #include <time.h><br>
>>>>> #include <cuda_runtime.h><br>
>>>>> #include <stdio.h><br>
>>>>> <br>
>>>>> int main(int argc,char **args)<br>
>>>>> {<br>
>>>>> clock_t start,s1,s2,s3;<br>
>>>>> double cputime;<br>
>>>>> double *init,tmp[100] = {0};<br>
>>>>> <br>
>>>>> start = clock();<br>
>>>>> cudaFree(0);<br>
>>>>> s1 = clock();<br>
>>>>> cudaMalloc((void **)&init,100*sizeof(double));<br>
>>>>> s2 = clock();<br>
>>>>> cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);<br>
>>>>> s3 = clock();<br>
>>>>> printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 - start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) (s3 - s2)) / CLOCKS_PER_SEC);<br>
>>>>> <br>
>>>>> return 0;<br>
>>>>> }<br>
>>>>> <br>
>>>>> <br>
>>>> <br>
>>> <br>
>> <br>
> <br>
<br>
</blockquote></div>
</blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>