<div dir="ltr">Damn, I am getting this problem on Summit and did a clean configure. <div>I removed the Kokkos arch=70 line and added </div><div> '--with-cudac-gencodearch=70',<br><div><br></div><div>Any ideas?</div><div><br></div><div>< Number of SNES iterations = 2<br>---<br>> Kokkos::Cuda::initialize ERROR: likely mismatch of architecture<br>> [h50n11:35759] *** Process received signal ***<br>> [h50n11:35759] Signal: Aborted (6)<br>> [h50n11:35759] Signal code: (-6)<br>> [h50n11:35759] [ 0] [0x2000000504d8]<br>> [h50n11:35759] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200032322094]<br>> [h50n11:35759] [ 2] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl10host_abortEPKc+0x58)[0x20000f944558]<br>> [h50n11:35759] [ 3] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xe60)[0x20000f95c210]<br>> [h50n11:35759] [ 4] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Cuda15impl_initializeENS0_12SelectDeviceEm+0x30)[0x20000f95c2b0]<br>> [h50n11:35759] [ 5] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl20CudaSpaceInitializer10initializeERKNS_13InitArgumentsE+0x34)[0x20000f95c314]<br>> [h50n11:35759] [ 6] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x60)[0x20000f926aa0]<br>> [h50n11:35759] [ 7] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_backendsERKNS_13InitArgumentsE+0x2c)[0x20000f926dac]<br>> [h50n11:35759] [ 8] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x2c)[0x20000f92b73c]<br>> [h50n11:35759] [ 9] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos10initializeENS_13InitArgumentsE+0x2c)[0x20000f92d08c]<br>> [h50n11:35759] [10] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f4)[0x200000343424]<br>> [h50n11:35759] [11] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x68b3dc)[0x20000077b3dc]<br>> [h50n11:35759] [12] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x3119b4)[0x2000004019b4]<br>> [h50n11:35759] [13] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x306790)[0x2000003f6790]<br>> [h50n11:35759] [14] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x274d24)[0x200000364d24]<br>> [h50n11:35759] [15] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0xd4)[0x200000412504]<br>> [h50n11:35759] [16] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x329f9c)[0x200000419f9c]<br>> [h50n11:35759] [17] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(VecScatterBegin+0x9c)[0x20000041fa8c]<br>> [h50n11:35759] [18] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x30)[0x2000010ef560]<br>> [h50n11:35759] [19] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x290)[0x2000013314b0]<br>> [h50n11:35759] [20] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x1513c10)[0x200001603c10]<br>> [h50n11:35759] [21] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESComputeFunction+0x164)[0x200001625584]<br>> [h50n11:35759] [22] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x15830f4)[0x2000016730f4]<br>> [h50n11:35759] [23] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESSolve+0x814)[0x200001634c44]<br>> [h50n11:35759] [24] ./ex19[0x10001a70]<br>> [h50n11:35759] [25] /lib64/libc.so.6(+0x25200)[0x200032305200]<br>> [h50n11:35759] [26] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000323053f4]<br>> [h50n11:35759] *** End of error message ***<br>> ERROR: One or more process (first noticed rank 0) terminated with signal 6 (core dumped)<br>/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials<br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, May 17, 2021 at 8:24 AM Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I thought I did a clean make but I made a clean one now and it seems to be working now.<div><br><div>Also, I am trying to fix this error message that I get on Cori with 'make check'.</div><div>I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these parameters, but I get error messages on Kokkos:</div><div><br></div><div>Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes<br>See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a><br><b>srun: error: Unable to create step for job 1923618: More processors requested than permitted<br></b>C/C++ example src/snes/tutorials/ex19 run successfully with cuda<br>gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored)<br>1,25c1<br>< atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000<br>< Vec Object: Exact Solution 2 MPI processes<br>< type: mpikokkos<br>< Process [0]<br>< 0.<br>< 0.015625<br>< 0.125<br>< Process [1]<br>< 0.421875<br>< 1.<br>< Vec Object: Forcing function 2 MPI processes<br>< type: mpikokkos<br>< Process [0]<br>< 1e-72<br>< 1.50024<br>< 3.01563<br>< Process [1]<br>< 4.67798<br>< 7.<br>< 0 SNES Function norm 5.414682427127e+00 <br>< 1 SNES Function norm 2.952582418265e-01 <br>< 2 SNES Function norm 4.502293658739e-04 <br>< 3 SNES Function norm 1.389665806646e-09 <br>< Number of SNES iterations = 3<br>< Norm of error 1.49752e-10 Iterations 3<br>---<br><b>> srun: error: Unable to create step for job 1923618: More processors requested than permitted<br></b>/global/homes/m/madams/petsc/src/snes/tutorials<br>Possible problem with ex3k running with kokkos-kernels, diffs above<br>=========================================<br>Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process<br>Completed test examples<br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, May 16, 2021 at 11:14 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><br></div><div>Could still be a gencode arch issue. Is it possible that Kokkos was built with the 80 arch and when you reran configure with 70 it did not rebuild Kokkos because it didn't know it needed to?</div><div><br></div><div>Sorry, but this may require another rm -rf arch* and running ./configure again.</div><div><br></div><div><a href="https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e" target="_blank">https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e</a></div><div><br></div><div><br></div><dl><dt style="font-family:"Trebuchet MS","DIN Pro",sans-serif;font-size:14px"><span style="display:block;background-color:rgb(239,239,240);border-top:1px solid rgb(216,220,222);border-bottom:1px solid rgb(238,243,245);padding:3px"><span style="background-color:yellow">cudaErrorInvalidDeviceFunction</span> = <span>98</span></span></dt><dd style="margin-left:50px;margin-bottom:5px;margin-top:2px;font-family:"Trebuchet MS","DIN Pro",sans-serif;font-size:14px">The requested device function does not exist or is not compiled for the proper device architecture.</dd></dl><div><br></div><div><br></div><div><br><blockquote type="cite"><div>On May 16, 2021, at 9:09 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:</div><br><div><div dir="ltr"><div dir="ltr">I now get this error. A blas error from VecAXPBYPCZ ...</div><div dir="ltr">Any ideas?<br><div><br></div><div><br></div><div>terminate called after throwing an instance of 'std::runtime_error'<br> what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) error( cudaErrorInvalidDeviceFunction): invalid device function /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654<br>Traceback functionality not available<br><br>[cgpu16:55192] *** Process received signal ***<br>[cgpu16:55192] Signal: Aborted (6)<br>[cgpu16:55192] Signal code: (-6)<br>[cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360]<br>[cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160]<br>[cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741]<br>[cgpu16:55192] [ 3] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83]<br>[cgpu16:55192] [ 4] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6]<br>[cgpu16:55192] [ 5] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21]<br>[cgpu16:55192] [ 6] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053]<br>[cgpu16:55192] [ 7] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f]<br>[cgpu16:55192] [ 8] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d]<br>[cgpu16:55192] [ 9] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7]<br>[cgpu16:55192] [10] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1]<br>[cgpu16:55192] [11] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781]<br>[cgpu16:55192] [12] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b]<br>[cgpu16:55192] [13] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1]<br>[cgpu16:55192] [14] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e]<br>[cgpu16:55192] [15] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a]<br>[cgpu16:55192] [16] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675]<br>[cgpu16:55192] [17] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e]<br>[cgpu16:55192] [18] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651]<br>[cgpu16:55192] [19] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c]<br>[cgpu16:55192] [20] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05]<br>[cgpu16:55192] [21] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455]<br>[cgpu16:55192] [22] ../ex2-kok[0x4033eb]<br>[cgpu16:55192] [23] /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a]<br>[cgpu16:55192] [24] ../ex2-kok[0x404aaa]<br>[cgpu16:55192] *** End of error message ***<br>/global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted "$@"<br>0 stopping nvidia-cuda-mps-control on cgpu16</div></div></div>
</div></blockquote></div><br></div></blockquote></div>
</blockquote></div>