[petsc-users] configure error
Junchao Zhang
junchao.zhang at gmail.com
Tue May 18 10:30:05 CDT 2021
* '--with-cuda-gencodearch=70',*
--Junchao Zhang
On Tue, May 18, 2021 at 6:29 AM Mark Adams <mfadams at lbl.gov> wrote:
> Damn, I am getting this problem on Summit and did a clean configure.
> I removed the Kokkos arch=70 line and added
> '--with-cudac-gencodearch=70',
>
> Any ideas?
>
> < Number of SNES iterations = 2
> ---
> > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture
> > [h50n11:35759] *** Process received signal ***
> > [h50n11:35759] Signal: Aborted (6)
> > [h50n11:35759] Signal code: (-6)
> > [h50n11:35759] [ 0] [0x2000000504d8]
> > [h50n11:35759] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200032322094]
> > [h50n11:35759] [ 2]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl10host_abortEPKc+0x58)[0x20000f944558]
> > [h50n11:35759] [ 3]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xe60)[0x20000f95c210]
> > [h50n11:35759] [ 4]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Cuda15impl_initializeENS0_12SelectDeviceEm+0x30)[0x20000f95c2b0]
> > [h50n11:35759] [ 5]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl20CudaSpaceInitializer10initializeERKNS_13InitArgumentsE+0x34)[0x20000f95c314]
> > [h50n11:35759] [ 6]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x60)[0x20000f926aa0]
> > [h50n11:35759] [ 7]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_backendsERKNS_13InitArgumentsE+0x2c)[0x20000f926dac]
> > [h50n11:35759] [ 8]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x2c)[0x20000f92b73c]
> > [h50n11:35759] [ 9]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos10initializeENS_13InitArgumentsE+0x2c)[0x20000f92d08c]
> > [h50n11:35759] [10]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f4)[0x200000343424]
> > [h50n11:35759] [11]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x68b3dc)[0x20000077b3dc]
> > [h50n11:35759] [12]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x3119b4)[0x2000004019b4]
> > [h50n11:35759] [13]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x306790)[0x2000003f6790]
> > [h50n11:35759] [14]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x274d24)[0x200000364d24]
> > [h50n11:35759] [15]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0xd4)[0x200000412504]
> > [h50n11:35759] [16]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x329f9c)[0x200000419f9c]
> > [h50n11:35759] [17]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(VecScatterBegin+0x9c)[0x20000041fa8c]
> > [h50n11:35759] [18]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x30)[0x2000010ef560]
> > [h50n11:35759] [19]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x290)[0x2000013314b0]
> > [h50n11:35759] [20]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x1513c10)[0x200001603c10]
> > [h50n11:35759] [21]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESComputeFunction+0x164)[0x200001625584]
> > [h50n11:35759] [22]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x15830f4)[0x2000016730f4]
> > [h50n11:35759] [23]
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESSolve+0x814)[0x200001634c44]
> > [h50n11:35759] [24] ./ex19[0x10001a70]
> > [h50n11:35759] [25] /lib64/libc.so.6(+0x25200)[0x200032305200]
> > [h50n11:35759] [26]
> /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000323053f4]
> > [h50n11:35759] *** End of error message ***
> > ERROR: One or more process (first noticed rank 0) terminated with
> signal 6 (core dumped)
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials
>
> On Mon, May 17, 2021 at 8:24 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I thought I did a clean make but I made a clean one now and it seems to
>> be working now.
>>
>> Also, I am trying to fix this error message that I get on Cori with 'make
>> check'.
>> I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these
>> parameters, but I get error messages on Kokkos:
>>
>> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
>> See http://www.mcs.anl.gov/petsc/documentation/faq.html
>>
>> *srun: error: Unable to create step for job 1923618: More processors
>> requested than permitted*C/C++ example src/snes/tutorials/ex19 run
>> successfully with cuda
>> gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored)
>> 1,25c1
>> < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000
>> < Vec Object: Exact Solution 2 MPI processes
>> < type: mpikokkos
>> < Process [0]
>> < 0.
>> < 0.015625
>> < 0.125
>> < Process [1]
>> < 0.421875
>> < 1.
>> < Vec Object: Forcing function 2 MPI processes
>> < type: mpikokkos
>> < Process [0]
>> < 1e-72
>> < 1.50024
>> < 3.01563
>> < Process [1]
>> < 4.67798
>> < 7.
>> < 0 SNES Function norm 5.414682427127e+00
>> < 1 SNES Function norm 2.952582418265e-01
>> < 2 SNES Function norm 4.502293658739e-04
>> < 3 SNES Function norm 1.389665806646e-09
>> < Number of SNES iterations = 3
>> < Norm of error 1.49752e-10 Iterations 3
>> ---
>>
>> *> srun: error: Unable to create step for job 1923618: More processors
>> requested than permitted*/global/homes/m/madams/petsc/src/snes/tutorials
>> Possible problem with ex3k running with kokkos-kernels, diffs above
>> =========================================
>> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI
>> process
>> Completed test examples
>>
>> On Sun, May 16, 2021 at 11:14 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>> Could still be a gencode arch issue. Is it possible that Kokkos was
>>> built with the 80 arch and when you reran configure with 70 it did not
>>> rebuild Kokkos because it didn't know it needed to?
>>>
>>> Sorry, but this may require another rm -rf arch* and running ./configure
>>> again.
>>>
>>>
>>> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e
>>>
>>>
>>> cudaErrorInvalidDeviceFunction = 98The requested device function does
>>> not exist or is not compiled for the proper device architecture.
>>>
>>>
>>>
>>> On May 16, 2021, at 9:09 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> I now get this error. A blas error from VecAXPBYPCZ ...
>>> Any ideas?
>>>
>>>
>>> terminate called after throwing an instance of 'std::runtime_error'
>>> what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func())
>>> error( cudaErrorInvalidDeviceFunction): invalid device function
>>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654
>>> Traceback functionality not available
>>>
>>> [cgpu16:55192] *** Process received signal ***
>>> [cgpu16:55192] Signal: Aborted (6)
>>> [cgpu16:55192] Signal code: (-6)
>>> [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360]
>>> [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160]
>>> [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741]
>>> [cgpu16:55192] [ 3]
>>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83]
>>> [cgpu16:55192] [ 4]
>>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6]
>>> [cgpu16:55192] [ 5]
>>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21]
>>> [cgpu16:55192] [ 6]
>>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053]
>>> [cgpu16:55192] [ 7]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f]
>>> [cgpu16:55192] [ 8]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d]
>>> [cgpu16:55192] [ 9]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7]
>>> [cgpu16:55192] [10]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1]
>>> [cgpu16:55192] [11]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781]
>>> [cgpu16:55192] [12]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b]
>>> [cgpu16:55192] [13]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1]
>>> [cgpu16:55192] [14]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e]
>>> [cgpu16:55192] [15]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a]
>>> [cgpu16:55192] [16]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675]
>>> [cgpu16:55192] [17]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e]
>>> [cgpu16:55192] [18]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651]
>>> [cgpu16:55192] [19]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c]
>>> [cgpu16:55192] [20]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05]
>>> [cgpu16:55192] [21]
>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455]
>>> [cgpu16:55192] [22] ../ex2-kok[0x4033eb]
>>> [cgpu16:55192] [23]
>>> /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a]
>>> [cgpu16:55192] [24] ../ex2-kok[0x404aaa]
>>> [cgpu16:55192] *** End of error message ***
>>> /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted
>>> "$@"
>>> 0 stopping nvidia-cuda-mps-control on cgpu16
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210518/924b17a4/attachment-0001.html>
More information about the petsc-users
mailing list