[petsc-users] configure error

Barry Smith bsmith at petsc.dev
Tue May 18 11:49:23 CDT 2021


  configure prints the information about CUDA at the end of the run, you can check that information to see which was actually used. 

  I have a new MR where PETSc records the gencodearch it was built with and then when your program starts up CUDA it verifies that the hardware supports the gencodearch it was built with. Hopefully this will alleviate difficulties in the future. Of course this won't help when using libraries that use CUDA built externally from PETSc.

   Barry


> On May 18, 2021, at 10:30 AM, Junchao Zhang <junchao.zhang at gmail.com> wrote:
> 
>     '--with-cuda-gencodearch=70',
> 
> --Junchao Zhang
> 
> 
> On Tue, May 18, 2021 at 6:29 AM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> Damn, I am getting this problem on Summit and did a clean configure. 
> I removed the Kokkos arch=70 line and added 
>     '--with-cudac-gencodearch=70',
> 
> Any ideas?
> 
> < Number of SNES iterations = 2
> ---
> > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture
> > [h50n11:35759] *** Process received signal ***
> > [h50n11:35759] Signal: Aborted (6)
> > [h50n11:35759] Signal code:  (-6)
> > [h50n11:35759] [ 0] [0x2000000504d8]
> > [h50n11:35759] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200032322094]
> > [h50n11:35759] [ 2] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl10host_abortEPKc+0x58)[0x20000f944558]
> > [h50n11:35759] [ 3] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xe60)[0x20000f95c210]
> > [h50n11:35759] [ 4] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Cuda15impl_initializeENS0_12SelectDeviceEm+0x30)[0x20000f95c2b0]
> > [h50n11:35759] [ 5] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl20CudaSpaceInitializer10initializeERKNS_13InitArgumentsE+0x34)[0x20000f95c314]
> > [h50n11:35759] [ 6] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x60)[0x20000f926aa0]
> > [h50n11:35759] [ 7] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_backendsERKNS_13InitArgumentsE+0x2c)[0x20000f926dac]
> > [h50n11:35759] [ 8] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x2c)[0x20000f92b73c]
> > [h50n11:35759] [ 9] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos10initializeENS_13InitArgumentsE+0x2c)[0x20000f92d08c]
> > [h50n11:35759] [10] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f4)[0x200000343424]
> > [h50n11:35759] [11] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x68b3dc)[0x20000077b3dc]
> > [h50n11:35759] [12] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x3119b4)[0x2000004019b4]
> > [h50n11:35759] [13] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x306790)[0x2000003f6790]
> > [h50n11:35759] [14] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x274d24)[0x200000364d24]
> > [h50n11:35759] [15] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0xd4)[0x200000412504]
> > [h50n11:35759] [16] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x329f9c)[0x200000419f9c]
> > [h50n11:35759] [17] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(VecScatterBegin+0x9c)[0x20000041fa8c]
> > [h50n11:35759] [18] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x30)[0x2000010ef560]
> > [h50n11:35759] [19] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x290)[0x2000013314b0]
> > [h50n11:35759] [20] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x1513c10)[0x200001603c10]
> > [h50n11:35759] [21] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESComputeFunction+0x164)[0x200001625584]
> > [h50n11:35759] [22] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x15830f4)[0x2000016730f4]
> > [h50n11:35759] [23] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESSolve+0x814)[0x200001634c44]
> > [h50n11:35759] [24] ./ex19[0x10001a70]
> > [h50n11:35759] [25] /lib64/libc.so.6(+0x25200)[0x200032305200]
> > [h50n11:35759] [26] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000323053f4]
> > [h50n11:35759] *** End of error message ***
> > ERROR:  One or more process (first noticed rank 0) terminated with signal 6 (core dumped)
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials
> 
> On Mon, May 17, 2021 at 8:24 AM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> I thought I did a clean make but I made a clean one now and it seems to be working now.
> 
> Also, I am trying to fix this error message that I get on Cori with 'make check'.
> I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these parameters, but I get error messages on Kokkos:
> 
> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
> See http://www.mcs.anl.gov/petsc/documentation/faq.html <http://www.mcs.anl.gov/petsc/documentation/faq.html>
> srun: error: Unable to create step for job 1923618: More processors requested than permitted
> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
> gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored)
> 1,25c1
> < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000
> < Vec Object: Exact Solution 2 MPI processes
> <   type: mpikokkos
> < Process [0]
> < 0.
> < 0.015625
> < 0.125
> < Process [1]
> < 0.421875
> < 1.
> < Vec Object: Forcing function 2 MPI processes
> <   type: mpikokkos
> < Process [0]
> < 1e-72
> < 1.50024
> < 3.01563
> < Process [1]
> < 4.67798
> < 7.
> <   0 SNES Function norm 5.414682427127e+00 
> <   1 SNES Function norm 2.952582418265e-01 
> <   2 SNES Function norm 4.502293658739e-04 
> <   3 SNES Function norm 1.389665806646e-09 
> < Number of SNES iterations = 3
> < Norm of error 1.49752e-10 Iterations 3
> ---
> > srun: error: Unable to create step for job 1923618: More processors requested than permitted
> /global/homes/m/madams/petsc/src/snes/tutorials
> Possible problem with ex3k running with kokkos-kernels, diffs above
> =========================================
> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
> Completed test examples
> 
> On Sun, May 16, 2021 at 11:14 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
> Could still be a gencode arch issue. Is it possible that Kokkos was built with the 80 arch and when you reran configure with 70 it did not rebuild Kokkos because it didn't know it needed to?
> 
> Sorry, but this may require another rm -rf arch* and running ./configure again.
> 
> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e>
> 
> 
> cudaErrorInvalidDeviceFunction = 98
> The requested device function does not exist or is not compiled for the proper device architecture.
> 
> 
> 
>> On May 16, 2021, at 9:09 PM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>> 
>> I now get this error. A blas error from VecAXPBYPCZ ...
>> Any ideas?
>> 
>> 
>> terminate called after throwing an instance of 'std::runtime_error'
>>   what():  cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) error( cudaErrorInvalidDeviceFunction): invalid device function /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654
>> Traceback functionality not available
>> 
>> [cgpu16:55192] *** Process received signal ***
>> [cgpu16:55192] Signal: Aborted (6)
>> [cgpu16:55192] Signal code:  (-6)
>> [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360]
>> [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160]
>> [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741]
>> [cgpu16:55192] [ 3] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83]
>> [cgpu16:55192] [ 4] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6]
>> [cgpu16:55192] [ 5] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21]
>> [cgpu16:55192] [ 6] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053]
>> [cgpu16:55192] [ 7] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f]
>> [cgpu16:55192] [ 8] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d]
>> [cgpu16:55192] [ 9] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7]
>> [cgpu16:55192] [10] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1]
>> [cgpu16:55192] [11] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781]
>> [cgpu16:55192] [12] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b]
>> [cgpu16:55192] [13] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1]
>> [cgpu16:55192] [14] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e]
>> [cgpu16:55192] [15] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a]
>> [cgpu16:55192] [16] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675]
>> [cgpu16:55192] [17] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e]
>> [cgpu16:55192] [18] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651]
>> [cgpu16:55192] [19] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c]
>> [cgpu16:55192] [20] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05]
>> [cgpu16:55192] [21] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455]
>> [cgpu16:55192] [22] ../ex2-kok[0x4033eb]
>> [cgpu16:55192] [23] /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a]
>> [cgpu16:55192] [24] ../ex2-kok[0x404aaa]
>> [cgpu16:55192] *** End of error message ***
>> /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted                 "$@"
>> 0 stopping nvidia-cuda-mps-control on cgpu16
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210518/39c5e3da/attachment-0001.html>


More information about the petsc-users mailing list