[petsc-users] configure error

Mark Adams mfadams at lbl.gov
Tue May 18 06:27:28 CDT 2021


Damn, I am getting this problem on Summit and did a clean configure.
I removed the Kokkos arch=70 line and added
    '--with-cudac-gencodearch=70',

Any ideas?

< Number of SNES iterations = 2
---
> Kokkos::Cuda::initialize ERROR: likely mismatch of architecture
> [h50n11:35759] *** Process received signal ***
> [h50n11:35759] Signal: Aborted (6)
> [h50n11:35759] Signal code:  (-6)
> [h50n11:35759] [ 0] [0x2000000504d8]
> [h50n11:35759] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200032322094]
> [h50n11:35759] [ 2]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl10host_abortEPKc+0x58)[0x20000f944558]
> [h50n11:35759] [ 3]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xe60)[0x20000f95c210]
> [h50n11:35759] [ 4]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Cuda15impl_initializeENS0_12SelectDeviceEm+0x30)[0x20000f95c2b0]
> [h50n11:35759] [ 5]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl20CudaSpaceInitializer10initializeERKNS_13InitArgumentsE+0x34)[0x20000f95c314]
> [h50n11:35759] [ 6]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x60)[0x20000f926aa0]
> [h50n11:35759] [ 7]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_backendsERKNS_13InitArgumentsE+0x2c)[0x20000f926dac]
> [h50n11:35759] [ 8]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x2c)[0x20000f92b73c]
> [h50n11:35759] [ 9]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos10initializeENS_13InitArgumentsE+0x2c)[0x20000f92d08c]
> [h50n11:35759] [10]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f4)[0x200000343424]
> [h50n11:35759] [11]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x68b3dc)[0x20000077b3dc]
> [h50n11:35759] [12]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x3119b4)[0x2000004019b4]
> [h50n11:35759] [13]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x306790)[0x2000003f6790]
> [h50n11:35759] [14]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x274d24)[0x200000364d24]
> [h50n11:35759] [15]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0xd4)[0x200000412504]
> [h50n11:35759] [16]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x329f9c)[0x200000419f9c]
> [h50n11:35759] [17]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(VecScatterBegin+0x9c)[0x20000041fa8c]
> [h50n11:35759] [18]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x30)[0x2000010ef560]
> [h50n11:35759] [19]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x290)[0x2000013314b0]
> [h50n11:35759] [20]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x1513c10)[0x200001603c10]
> [h50n11:35759] [21]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESComputeFunction+0x164)[0x200001625584]
> [h50n11:35759] [22]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x15830f4)[0x2000016730f4]
> [h50n11:35759] [23]
/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESSolve+0x814)[0x200001634c44]
> [h50n11:35759] [24] ./ex19[0x10001a70]
> [h50n11:35759] [25] /lib64/libc.so.6(+0x25200)[0x200032305200]
> [h50n11:35759] [26]
/lib64/libc.so.6(__libc_start_main+0xc4)[0x2000323053f4]
> [h50n11:35759] *** End of error message ***
> ERROR:  One or more process (first noticed rank 0) terminated with signal
6 (core dumped)
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials

On Mon, May 17, 2021 at 8:24 AM Mark Adams <mfadams at lbl.gov> wrote:

> I thought I did a clean make but I made a clean one now and it seems to be
> working now.
>
> Also, I am trying to fix this error message that I get on Cori with 'make
> check'.
> I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these
> parameters, but I get error messages on Kokkos:
>
> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
>
> *srun: error: Unable to create step for job 1923618: More processors
> requested than permitted*C/C++ example src/snes/tutorials/ex19 run
> successfully with cuda
> gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored)
> 1,25c1
> < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000
> < Vec Object: Exact Solution 2 MPI processes
> <   type: mpikokkos
> < Process [0]
> < 0.
> < 0.015625
> < 0.125
> < Process [1]
> < 0.421875
> < 1.
> < Vec Object: Forcing function 2 MPI processes
> <   type: mpikokkos
> < Process [0]
> < 1e-72
> < 1.50024
> < 3.01563
> < Process [1]
> < 4.67798
> < 7.
> <   0 SNES Function norm 5.414682427127e+00
> <   1 SNES Function norm 2.952582418265e-01
> <   2 SNES Function norm 4.502293658739e-04
> <   3 SNES Function norm 1.389665806646e-09
> < Number of SNES iterations = 3
> < Norm of error 1.49752e-10 Iterations 3
> ---
>
> *> srun: error: Unable to create step for job 1923618: More processors
> requested than permitted*/global/homes/m/madams/petsc/src/snes/tutorials
> Possible problem with ex3k running with kokkos-kernels, diffs above
> =========================================
> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
> Completed test examples
>
> On Sun, May 16, 2021 at 11:14 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>> Could still be a gencode arch issue. Is it possible that Kokkos was built
>> with the 80 arch and when you reran configure with 70 it did not rebuild
>> Kokkos because it didn't know it needed to?
>>
>> Sorry, but this may require another rm -rf arch* and running ./configure
>> again.
>>
>>
>> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e
>>
>>
>> cudaErrorInvalidDeviceFunction = 98The requested device function does
>> not exist or is not compiled for the proper device architecture.
>>
>>
>>
>> On May 16, 2021, at 9:09 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>> I now get this error. A blas error from VecAXPBYPCZ ...
>> Any ideas?
>>
>>
>> terminate called after throwing an instance of 'std::runtime_error'
>>   what():  cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func())
>> error( cudaErrorInvalidDeviceFunction): invalid device function
>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654
>> Traceback functionality not available
>>
>> [cgpu16:55192] *** Process received signal ***
>> [cgpu16:55192] Signal: Aborted (6)
>> [cgpu16:55192] Signal code:  (-6)
>> [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360]
>> [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160]
>> [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741]
>> [cgpu16:55192] [ 3]
>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83]
>> [cgpu16:55192] [ 4]
>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6]
>> [cgpu16:55192] [ 5]
>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21]
>> [cgpu16:55192] [ 6]
>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053]
>> [cgpu16:55192] [ 7]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f]
>> [cgpu16:55192] [ 8]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d]
>> [cgpu16:55192] [ 9]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7]
>> [cgpu16:55192] [10]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1]
>> [cgpu16:55192] [11]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781]
>> [cgpu16:55192] [12]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b]
>> [cgpu16:55192] [13]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1]
>> [cgpu16:55192] [14]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e]
>> [cgpu16:55192] [15]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a]
>> [cgpu16:55192] [16]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675]
>> [cgpu16:55192] [17]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e]
>> [cgpu16:55192] [18]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651]
>> [cgpu16:55192] [19]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c]
>> [cgpu16:55192] [20]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05]
>> [cgpu16:55192] [21]
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455]
>> [cgpu16:55192] [22] ../ex2-kok[0x4033eb]
>> [cgpu16:55192] [23]
>> /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a]
>> [cgpu16:55192] [24] ../ex2-kok[0x404aaa]
>> [cgpu16:55192] *** End of error message ***
>> /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted
>>     "$@"
>> 0 stopping nvidia-cuda-mps-control on cgpu16
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210518/54fcf896/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 3271339 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210518/54fcf896/attachment-0001.obj>


More information about the petsc-users mailing list