[petsc-users] Error running src/snes/tutorials/ex19 on Nvidia Tesla K40m : CUDA ERROR (code = 101, invalid device ordinal)

Juan Pablo de Lima Costa Salazar jp.salazar at pm.me
Thu Jul 14 11:32:28 CDT 2022


Hello,

I was hoping to get help regarding a runtime error I am encountering on a cluster node with 4 Tesla K40m GPUs after configuring PETSc with the following command:

$./configure --force \
--with-precision=double \
--with-debugging=0 \
--COPTFLAGS=-O3 \
--CXXOPTFLAGS=-O3 \
--FOPTFLAGS=-O3 \
PETSC_ARCH=linux64GccDPInt32-spack \
--download-fblaslapack \
--download-openblas \
--download-hypre \
--download-hypre-configure-arguments=--enable-unified-memory \
--with-mpi-dir=/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.4 \
--with-cuda=1 \
--download-suitesparse \
--download-dir=downloads \
--with-cudac=/opt/ohpc/admin/spack/0.15.0/opt/spack/linux-centos8-ivybridge/gcc-9.3.0/cuda-11.7.0-hel25vgwc7fixnvfl5ipvnh34fnskw3m/bin/nvcc \
--with-packages-download-dir=downloads \
--download-sowing=downloads/v1.1.26-p4.tar.gz \
--with-cuda-arch=35

When I run

$ make PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda PETSC_ARCH=linux64GccDPInt32-spack check
Running check examples to verify correct installation
Using PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda and PETSC_ARCH=linux64GccDPInt32-spack
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
3,5c3,15
< 1 SNES Function norm 4.12227e-06
< 2 SNES Function norm 6.098e-11
< Number of SNES iterations = 2
---
> CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> --------------------------------------------------------------------------
> Primary job terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus causing
> the job to be terminated. The first process to do so was:
>
> Process name: [[52712,1],0]
> Exit code: 1
> --------------------------------------------------------------------------
/home/juan/OpenFOAM/juan-v2206/petsc-cuda/src/snes/tutorials
Possible problem with ex19 running with hypre, diffs above
=========================================
C/C++ example src/snes/tutorials/ex19 run successfully with cuda
C/C++ example src/snes/tutorials/ex19 run successfully with suitesparse
Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
Completed test examples

I have compiled the code on the head node (without GPUs) and on the compute node where there are 4 GPUs.

$nvidia-debugdump -l
Found 4 NVIDIA devices
Device ID: 0
Device name: Tesla K40m
GPU internal ID: 0320717032250

Device ID: 1
Device name: Tesla K40m
GPU internal ID: 0320717031968

Device ID: 2
Device name: Tesla K40m
GPU internal ID: 0320717032246

Device ID: 3
Device name: Tesla K40m
GPU internal ID: 0320717032235

Attached are the log files form configure and make.

Any pointers are highly appreciated. My intention is to use PETSc as a linear solver for OpenFOAM, leveraging the availability of GPUs at the same time. Currently I can run PETSc without GPU support.

Cheers,
Juan S.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220714/39b3553e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log.tar.gz
Type: application/x-gzip
Size: 109937 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220714/39b3553e/attachment-0002.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log.tar.gz
Type: application/x-gzip
Size: 15760 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220714/39b3553e/attachment-0003.gz>


More information about the petsc-users mailing list