[petsc-users] Error running src/snes/tutorials/ex19 on Nvidia Tesla K40m : CUDA ERROR (code = 101, invalid device ordinal)

Barry Smith bsmith at petsc.dev
Thu Jul 14 11:54:51 CDT 2022


  So the PETSc test all run, including the test that uses a GPU.

  The hypre test is failing. It is impossible to tell from the output why. 

  You can run it manually, cd src/snes/tutorials

make ex19
mpiexec -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -da_refine 3 -snes_monitor_short -ksp_norm_type unpreconditioned -pc_type hypre -info > somefile

then take a look at the output in somefile and send it to us. 

  Barry



> On Jul 14, 2022, at 12:32 PM, Juan Pablo de Lima Costa Salazar via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hello,
> 
> I was hoping to get help regarding a runtime error I am encountering on a cluster node with 4 Tesla K40m GPUs after configuring PETSc with the following command:
> 
> $./configure --force \
>                   --with-precision=double  \
>                   --with-debugging=0 \
>                   --COPTFLAGS=-O3 \
>                   --CXXOPTFLAGS=-O3 \
>                   --FOPTFLAGS=-O3 \
>                   PETSC_ARCH=linux64GccDPInt32-spack \
>                   --download-fblaslapack \
>                   --download-openblas \
>                   --download-hypre \
>                   --download-hypre-configure-arguments=--enable-unified-memory \
>                   --with-mpi-dir=/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.4 \
>                   --with-cuda=1 \
>                   --download-suitesparse \
>                   --download-dir=downloads \
>                   --with-cudac=/opt/ohpc/admin/spack/0.15.0/opt/spack/linux-centos8-ivybridge/gcc-9.3.0/cuda-11.7.0-hel25vgwc7fixnvfl5ipvnh34fnskw3m/bin/nvcc \
>                   --with-packages-download-dir=downloads \
>                   --download-sowing=downloads/v1.1.26-p4.tar.gz \
>                   --with-cuda-arch=35
> 
> When I run
> 
> $ make PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda PETSC_ARCH=linux64GccDPInt32-spack check
> Running check examples to verify correct installation
> Using PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda and PETSC_ARCH=linux64GccDPInt32-spack
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> 3,5c3,15
> <   1 SNES Function norm 4.12227e-06 
> <   2 SNES Function norm 6.098e-11 
> < Number of SNES iterations = 2
> ---
> > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> > --------------------------------------------------------------------------
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code. Per user-direction, the job has been aborted.
> > --------------------------------------------------------------------------
> > --------------------------------------------------------------------------
> > mpiexec detected that one or more processes exited with non-zero status, thus causing
> > the job to be terminated. The first process to do so was:
> > 
> >   Process name: [[52712,1],0]
> >   Exit code:    1
> > --------------------------------------------------------------------------
> /home/juan/OpenFOAM/juan-v2206/petsc-cuda/src/snes/tutorials
> Possible problem with ex19 running with hypre, diffs above
> =========================================
> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
> C/C++ example src/snes/tutorials/ex19 run successfully with suitesparse
> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
> Completed test examples
> 
> I have compiled the code on the head node (without GPUs) and on the compute node where there are 4 GPUs. 
> 
> $nvidia-debugdump -l
> Found 4 NVIDIA devices
> 	Device ID:              0
> 	Device name:            Tesla K40m
> 	GPU internal ID:        0320717032250
> 
> 	Device ID:              1
> 	Device name:            Tesla K40m
> 	GPU internal ID:        0320717031968
> 
> 	Device ID:              2
> 	Device name:            Tesla K40m
> 	GPU internal ID:        0320717032246
> 
> 	Device ID:              3
> 	Device name:            Tesla K40m
> 	GPU internal ID:        0320717032235
> 
> Attached are the log files form configure and make.
> 
> Any pointers are highly appreciated. My intention is to use PETSc as a linear solver for OpenFOAM, leveraging the availability of GPUs at the same time. Currently I can run PETSc without GPU support. 
> 
> Cheers,
> Juan S.
> 
> 
> 
> 
> 
> <configure.log.tar.gz><make.log.tar.gz>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220714/ee5ccdf6/attachment.html>


More information about the petsc-users mailing list