[petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

Fande Kong fdkong.jd at gmail.com
Wed Jan 19 13:18:09 CST 2022


On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch <jacob.fai at gmail.com>
wrote:

> Are you running on login nodes or compute nodes (I can’t seem to tell from
> the configure.log)?
>

I was compiling codes on login nodes, and running codes on compute nodes.
Login nodes do not have GPUs, but compute nodes do have GPUs.

Just to be clear, the same thing (code, machine) with PETSc-3.16.1 worked
perfectly. I have this trouble with PETSc-main.

I might do "git bisect" when I have time

Thanks,

Fande


If running from login nodes, do they support running with GPU’s? Some
> clusters will install stub versions of cuda runtime on login nodes (such
> that configuration can find them), but that won’t actually work in
> practice.
>
> If this is the case then CUDA will fail to initialize with this exact
> error. IIRC It wasn’t until CUDA 11.1 that they created a specific error
> code (cudaErrorStubLibrary) for it.
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Jan 19, 2022, at 12:07, Fande Kong <fdkong.jd at gmail.com> wrote:
>
> Thanks, Jacob, and Junchao
>
> The log was attached.  I am using Sawtooth at INL
> https://hpc.inl.gov/SitePages/Home.aspx
>
>
> Thanks,
>
> Fande
>
> On Wed, Jan 19, 2022 at 10:32 AM Jacob Faibussowitsch <jacob.fai at gmail.com>
> wrote:
>
>> Hi Fande,
>>
>> What machine are you running this on? Please attach configure.log so I
>> can troubleshoot this.
>>
>> Best regards,
>>
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>>
>> On Jan 19, 2022, at 10:04, Fande Kong <fdkong.jd at gmail.com> wrote:
>>
>> Hi All,
>>
>> Upgraded PETSc from 3.16.1 to the current main branch. I suddenly got the
>> following error message:
>>
>> 2d_diffusion]$ ../../../moose_test-dbg -i 2d_diffusion_test.i
>> -use_gpu_aware_mpi 0 -gpu_mat_type aijcusparse -gpu_vec_type cuda
>>  -log_view
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Missing or incorrect user input
>> [0]PETSC ERROR: Cannot eagerly initialize cuda, as doing so results in
>> cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is
>> insufficient for CUDA runtime version
>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-618-gad32f7e  GIT
>> Date: 2022-01-18 16:04:31 +0000
>> [0]PETSC ERROR: ../../../moose_test-dbg on a arch-linux-c-opt named
>> r8i3n0 by kongf Wed Jan 19 08:30:13 2022
>> [0]PETSC ERROR: Configure options --with-debugging=no
>> --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1
>> --download-ptscotch=1 --download-parmetis=1 --download-mumps=1
>> --download-strumpack=1 --download-scalapack=1 --download-slepc=1
>> --with-mpi=1 --with-cxx-dialect=C++14 --with-fortran-bindings=0
>> --with-sowing=0 --with-64-bit-indices --with-make-np=24 --with-cuda
>> --with-cudac=nvcc --with-cuda-arch=70 --download-kokkos=1
>> [0]PETSC ERROR: #1 initialize() at
>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:298
>> [0]PETSC ERROR: #2 PetscDeviceInitializeTypeFromOptions_Private() at
>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:299
>> [0]PETSC ERROR: #3 PetscDeviceInitializeFromOptions_Internal() at
>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:425
>> [0]PETSC ERROR: #4 PetscInitialize_Common() at
>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/pinit.c:963
>> [0]PETSC ERROR: #5 PetscInitialize() at
>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/pinit.c:1238
>> [0]PETSC ERROR: #6 SlepcInitialize() at
>> /home/kongf/workhome/sawtooth/moosegpu/petsc/arch-linux-c-opt/externalpackages/git.slepc/src/sys/slepcinit.c:275
>> [0]PETSC ERROR: #7 LibMeshInit() at ../src/base/libmesh.C:522
>> [r8i3n0:mpi_rank_0][MPIDI_CH3_Abort] application called
>> MPI_Abort(MPI_COMM_WORLD, 95) - process 0: No such file or directory (2)
>>
>> Thanks,
>>
>> Fande
>>
>>
>> <configure.log>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220119/376c320b/attachment.html>


More information about the petsc-users mailing list