[petsc-users] configure error

Mark Adams mfadams at lbl.gov
Mon May 17 07:24:00 CDT 2021


I thought I did a clean make but I made a clean one now and it seems to be
working now.

Also, I am trying to fix this error message that I get on Cori with 'make
check'.
I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these
parameters, but I get error messages on Kokkos:

Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
See http://www.mcs.anl.gov/petsc/documentation/faq.html

*srun: error: Unable to create step for job 1923618: More processors
requested than permitted*C/C++ example src/snes/tutorials/ex19 run
successfully with cuda
gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored)
1,25c1
< atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000
< Vec Object: Exact Solution 2 MPI processes
<   type: mpikokkos
< Process [0]
< 0.
< 0.015625
< 0.125
< Process [1]
< 0.421875
< 1.
< Vec Object: Forcing function 2 MPI processes
<   type: mpikokkos
< Process [0]
< 1e-72
< 1.50024
< 3.01563
< Process [1]
< 4.67798
< 7.
<   0 SNES Function norm 5.414682427127e+00
<   1 SNES Function norm 2.952582418265e-01
<   2 SNES Function norm 4.502293658739e-04
<   3 SNES Function norm 1.389665806646e-09
< Number of SNES iterations = 3
< Norm of error 1.49752e-10 Iterations 3
---

*> srun: error: Unable to create step for job 1923618: More processors
requested than permitted*/global/homes/m/madams/petsc/src/snes/tutorials
Possible problem with ex3k running with kokkos-kernels, diffs above
=========================================
Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
Completed test examples

On Sun, May 16, 2021 at 11:14 PM Barry Smith <bsmith at petsc.dev> wrote:

>
> Could still be a gencode arch issue. Is it possible that Kokkos was built
> with the 80 arch and when you reran configure with 70 it did not rebuild
> Kokkos because it didn't know it needed to?
>
> Sorry, but this may require another rm -rf arch* and running ./configure
> again.
>
>
> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e
>
>
> cudaErrorInvalidDeviceFunction = 98The requested device function does not
> exist or is not compiled for the proper device architecture.
>
>
>
> On May 16, 2021, at 9:09 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> I now get this error. A blas error from VecAXPBYPCZ ...
> Any ideas?
>
>
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func())
> error( cudaErrorInvalidDeviceFunction): invalid device function
> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654
> Traceback functionality not available
>
> [cgpu16:55192] *** Process received signal ***
> [cgpu16:55192] Signal: Aborted (6)
> [cgpu16:55192] Signal code:  (-6)
> [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360]
> [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160]
> [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741]
> [cgpu16:55192] [ 3]
> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83]
> [cgpu16:55192] [ 4]
> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6]
> [cgpu16:55192] [ 5]
> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21]
> [cgpu16:55192] [ 6]
> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053]
> [cgpu16:55192] [ 7]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f]
> [cgpu16:55192] [ 8]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d]
> [cgpu16:55192] [ 9]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7]
> [cgpu16:55192] [10]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1]
> [cgpu16:55192] [11]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781]
> [cgpu16:55192] [12]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b]
> [cgpu16:55192] [13]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1]
> [cgpu16:55192] [14]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e]
> [cgpu16:55192] [15]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a]
> [cgpu16:55192] [16]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675]
> [cgpu16:55192] [17]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e]
> [cgpu16:55192] [18]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651]
> [cgpu16:55192] [19]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c]
> [cgpu16:55192] [20]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05]
> [cgpu16:55192] [21]
> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455]
> [cgpu16:55192] [22] ../ex2-kok[0x4033eb]
> [cgpu16:55192] [23]
> /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a]
> [cgpu16:55192] [24] ../ex2-kok[0x404aaa]
> [cgpu16:55192] *** End of error message ***
> /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted
>     "$@"
> 0 stopping nvidia-cuda-mps-control on cgpu16
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210517/ce211081/attachment-0001.html>


More information about the petsc-users mailing list