[petsc-users] runtime error on Summit with nvhpc21.7

Mark Adams mfadams at lbl.gov
Fri Aug 27 17:05:44 CDT 2021


On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> I don't understand the configure options
>
>
> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/
> *nvcc_wrapper*
> --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
> --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort
> COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast"
> CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun
> -g 1" *--with-cuda=0*
> --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
> --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0
> --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc
> --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b
>
> Why do you need to use nvcc_wrapper if you do not want to use cuda?
>

That code that is having a problem links with nvcc_wrapper.
They get a segv that I sent earlier, in PetscInitialize so I figure I
should use the same compiler / linker.
They use CUDA, but we don't need PETSc to use CUDA now.


> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you
> also need --with-clanguage=c++
>

I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper and that
built make check works. I gave it to them to test.

Thanks,
Mark


>
> --Junchao Zhang
>
>
> On Fri, Aug 27, 2021 at 3:28 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>>
>>
>> On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Aug 27, 2021, 1:52 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> I think the problem is that I build with MPICC and they use nvcc_wrapper.
>>>> I could just try building PETSc with CC=nvcc_wrapper, but it was not
>>>> clear if this was the way to go.
>>>>
>>> --with-nvcc=nvcc_wrapper
>>>
>>
>> What do I specify for cc and CC?
>>
>>
>>> I will try it.
>>>> Thanks,
>>>> Mark
>>>>
>>>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I
>>>>>> built. He is getting this runtime error.
>>>>>>
>>>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to
>>>>>> get a link line, and it ran fine.
>>>>>> I appended the users link line and my test.
>>>>>>
>>>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild
>>>>>> PETSc using that, maybe we just need to make sure we are both using the
>>>>>> same underlying compiler or should they use mpiCC?
>>>>>>
>>>>> It looks like they used nvcc_wrapper to replace nvcc.  You can ask
>>>>> them to use nvcc directly to see what happens. But the error happened in
>>>>> petsc initialization, petscsys_petscinitializenohelp, so I doubt it
>>>>> helps.  The easy way is to just attach a debugger.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> [e13n16:591873] *** Process received signal ***
>>>>>>
>>>>>> [e13n16:591873] Signal: Segmentation fault (11)
>>>>>>
>>>>>> [e13n16:591873] Signal code: Invalid permissions (2)
>>>>>>
>>>>>> [e13n16:591873] Failing at address: 0x102c87e0
>>>>>>
>>>>>> [e13n16:591873] [ 0]
>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>
>>>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
>>>>>>
>>>>>> [e13n16:591872] Signal: Segmentation fault (11)
>>>>>>
>>>>>> [e13n16:591872] Signal code: Invalid permissions (2)
>>>>>>
>>>>>> [e13n16:591872] Failing at address: 0x102c87e0
>>>>>>
>>>>>> [e13n16:591872] [ 0]
>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>
>>>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
>>>>>>
>>>>>> [e13n16:591871] Signal: Segmentation fault (11)
>>>>>>
>>>>>> [e13n16:591871] Signal code: Invalid permissions (2)
>>>>>>
>>>>>> [e13n16:591871] Failing at address: 0x102c87e0
>>>>>>
>>>>>> [e13n16:591871] [ 0]
>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>
>>>>>> [e13n16:591871] [ 1]
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
>>>>>>
>>>>>> [e13n16:591874] Signal: Segmentation fault (11)
>>>>>>
>>>>>> [e13n16:591874] Signal code: Invalid permissions (2)
>>>>>>
>>>>>> [e13n16:591874] Failing at address: 0x102c87e0
>>>>>>
>>>>>> [e13n16:591874] [ 0]
>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>
>>>>>> [e13n16:591874] [ 1]
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>> [e13n16:591874] [ 2]
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>
>>>>>> [e13n16:591874] [ 3]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>
>>>>>> [e13n16:591874] [ 4]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>
>>>>>> [e13n16:591874] [ 5]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>
>>>>>> [e13n16:591874] [ 6]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>
>>>>>> [e13n16:591874] [ 7]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>
>>>>>> [e13n16:591874] [ 8]
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>
>>>>>> [e13n16:591871] [ 3]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>
>>>>>> [e13n16:591871] [ 4]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>
>>>>>> [e13n16:591871] [ 5]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>
>>>>>> [e13n16:591871] [ 6]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>
>>>>>> [e13n16:591871] [ 7]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>
>>>>>> [e13n16:591871] [ 8]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>
>>>>>> [e13n16:591871] [ 9]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>
>>>>>> [e13n16:591871] *** End of error message ***
>>>>>>
>>>>>>
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>
>>>>>> [e13n16:591874] [ 9]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>
>>>>>> [e13n16:591874] *** End of error message ***
>>>>>>
>>>>>>
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>> [e13n16:591872] [ 2]
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>
>>>>>> [e13n16:591872] [ 3]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>
>>>>>> [e13n16:591872] [ 4]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>
>>>>>> [e13n16:591872] [ 5]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>
>>>>>> [e13n16:591872] [ 6]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>
>>>>>> [e13n16:591872] [ 7]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>
>>>>>> [e13n16:591872] [ 8]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>
>>>>>> [e13n16:591872] [ 9]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>
>>>>>> [e13n16:591872] *** End of error message ***
>>>>>>
>>>>>>
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>> [e13n16:591873] [ 2]
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>
>>>>>> [e13n16:591873] [ 3]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>
>>>>>> [e13n16:591873] [ 4]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>
>>>>>> [e13n16:591873] [ 5]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>
>>>>>> [e13n16:591873] [ 6]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>
>>>>>> [e13n16:591873] [ 7]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>
>>>>>> [e13n16:591873] [ 8]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>
>>>>>> [e13n16:591873] [ 9]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>
>>>>>> [e13n16:591873] *** End of error message ***
>>>>>>
>>>>>> ERROR:  One or more process (first noticed rank 1) terminated with
>>>>>> signal 11 (core dumped)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>>>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
>>>>>> bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>>> liblibxgc-es-cpp.a
>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
>>>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
>>>>>> /usr/lib64/libcuda.so
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
>>>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
>>>>>> -lmpi_ibm_mpifh -lnvf
>>>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>>>
>>>>>>
>>>>>>
>>>>>> 19:39 main=
>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make
>>>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
>>>>>> PETSC_ARCH="" ex1
>>>>>> *mpicc* -fPIC -g -fast  -fPIC -g -fast
>>>>>>  -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
>>>>>>     ex1.c
>>>>>>  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
>>>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
>>>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
>>>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
>>>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
>>>>>> 19:40 main=
>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
>>>>>> --version*
>>>>>>
>>>>>> *nvc 21.7-0 linuxpower target on Linuxpower*
>>>>>> NVIDIA Compilers and Tools
>>>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights
>>>>>> reserved.
>>>>>> 19:40 main=
>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1
>>>>>> ./ex1 -ksp_monitor
>>>>>>     0 KSP Residual norm 6.041522986797e+00
>>>>>>     1 KSP Residual norm 1.042493382631e+00
>>>>>>     2 KSP Residual norm 7.950907844730e-16
>>>>>>     0 KSP Residual norm 4.786756692342e+00
>>>>>>     1 KSP Residual norm 1.426392207750e-01
>>>>>>     2 KSP Residual norm 1.801079604472e-15
>>>>>>     0 KSP Residual norm 2.986456323228e+00
>>>>>>     1 KSP Residual norm 7.669888809223e-02
>>>>>>     2 KSP Residual norm 3.744083117256e-16
>>>>>>     0 KSP Residual norm 2.306244667700e-01
>>>>>>     1 KSP Residual norm 1.355550749587e-02
>>>>>>     2 KSP Residual norm 5.845524837731e-17
>>>>>>     0 KSP Residual norm 1.936314002654e-03
>>>>>>     1 KSP Residual norm 2.125593590819e-04
>>>>>>     2 KSP Residual norm 6.987141455073e-20
>>>>>>     0 KSP Residual norm 1.435593531990e-07
>>>>>>     1 KSP Residual norm 2.588271385567e-08
>>>>>>     2 KSP Residual norm 3.942196167935e-23
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/fb85a0f6/attachment-0001.html>


More information about the petsc-users mailing list