[petsc-users] runtime error on Summit with nvhpc21.7

Mark Adams mfadams at lbl.gov
Sat Aug 28 06:40:20 CDT 2021


cc'ing Robert who is taking over from Aaron for a few days.

Robert, I suggest hoisting PetscInitailize into main or at least a C call
of some sort.

This error in pgf90_str_copy_klen might be be avoided by not giving
PetscInitialize a string ('petsc.rc') and linking petsc.rc --> .petscrc
(PETSc will look for .petscrc by default).

more below.

On Sat, Aug 28, 2021 at 12:19 AM Barry Smith <bsmith at petsc.dev> wrote:

>
>
> On Aug 27, 2021, at 5:05 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
>
> On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> I don't understand the configure options
>>
>>
>> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/
>> *nvcc_wrapper*
>> --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>> --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort
>> COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast"
>> CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun
>> -g 1" *--with-cuda=0*
>> --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>> --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0
>> --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc
>> --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b
>>
>> Why do you need to use nvcc_wrapper if you do not want to use cuda?
>>
>
> That code that is having a problem links with nvcc_wrapper.
> They get a segv that I sent earlier, in PetscInitialize so I figure I
> should use the same compiler / linker.
> They use CUDA, but we don't need PETSc to use CUDA now.
>
>
>> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you
>> also need --with-clanguage=c++
>>
>
> I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper
> and that built make check works. I gave it to them to test.
>
>
>    This is an odd way to do it. The Kokkos nvcc_wrapper wraps the nvcc
> compiler to allow it to compile Kokkos code and link it against the Kokkos
> libraries; so using nvcc_wrapper as nvcc is strangely recursive; sure
> everything in PETSc/Kokkos may build ok (assuming the nvcc that the
> nvcc_wrapper uses is correct for the situation and uses a correct
> underlying C++) but it is freakish. PETSc should just be built with the
> same nvcc that the nvcc_wrapper is using and using the same inner C++
> compiler.
>

Yes, this convoluted. Thanks for your take on this.
That said, they have been struggling to get Kokkos to build with nvhpc and
I can see this is what they have compiling and are pressing on with a
milestone that is due soon.

Anyway, I found that PetscInitialize is called from Fortran code (in 25+
years have we ever seen a C++ code call PetscInitialize from a Fortran
subroutine ?), which should be fine. Just unusual.
This explains the error coming from a Fortran library:

[e13n16:591874] [ 1]
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>
They are using this Fortran compiler and so I built PETSc with it:

/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort

I see:

07:16 1  ~$ which mpif90
/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpif90

so this is the nvhpc-21.7 Fortran.

Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210828/32b38355/attachment.html>


More information about the petsc-users mailing list