[petsc-users] Error running src/snes/tutorials/ex19 on Nvidia Tesla K40m : CUDA ERROR (code = 101, invalid device ordinal)

Stefano Zampini stefano.zampini at gmail.com
Thu Jul 14 11:56:42 CDT 2022


You don't need unified memory for boomeramg to work.

On Thu, Jul 14, 2022, 18:55 Barry Smith <bsmith at petsc.dev> wrote:

>
>   So the PETSc test all run, including the test that uses a GPU.
>
>   The hypre test is failing. It is impossible to tell from the output why.
>
>   You can run it manually, cd src/snes/tutorials
>
> make ex19
> mpiexec -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -da_refine
> 3 -snes_monitor_short -ksp_norm_type unpreconditioned -pc_type hypre -info
> > somefile
>
> then take a look at the output in somefile and send it to us.
>
>   Barry
>
>
>
> On Jul 14, 2022, at 12:32 PM, Juan Pablo de Lima Costa Salazar via
> petsc-users <petsc-users at mcs.anl.gov> wrote:
>
> Hello,
>
> I was hoping to get help regarding a runtime error I am encountering on a
> cluster node with 4 Tesla K40m GPUs after configuring PETSc with the
> following command:
>
> $./configure --force \
>                   --with-precision=double  \
>                   --with-debugging=0 \
>                   --COPTFLAGS=-O3 \
>                   --CXXOPTFLAGS=-O3 \
>                   --FOPTFLAGS=-O3 \
>                   PETSC_ARCH=linux64GccDPInt32-spack \
>                   --download-fblaslapack \
>                   --download-openblas \
>                   --download-hypre \
>
> --download-hypre-configure-arguments=--enable-unified-memory \
>                   --with-mpi-dir=/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.4 \
>                   --with-cuda=1 \
>                   --download-suitesparse \
>                   --download-dir=downloads \
>
> --with-cudac=/opt/ohpc/admin/spack/0.15.0/opt/spack/linux-centos8-ivybridge/gcc-9.3.0/cuda-11.7.0-hel25vgwc7fixnvfl5ipvnh34fnskw3m/bin/nvcc
> \
>                   --with-packages-download-dir=downloads \
>                   --download-sowing=downloads/v1.1.26-p4.tar.gz \
>                   --with-cuda-arch=35
>
> When I run
>
> $ make PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda
> PETSC_ARCH=linux64GccDPInt32-spack check
> Running check examples to verify correct installation
> Using PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda and
> PETSC_ARCH=linux64GccDPInt32-spack
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> 3,5c3,15
> <   1 SNES Function norm 4.12227e-06
> <   2 SNES Function norm 6.098e-11
> < Number of SNES iterations = 2
> ---
> > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> >
> --------------------------------------------------------------------------
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > mpiexec detected that one or more processes exited with non-zero status,
> thus causing
> > the job to be terminated. The first process to do so was:
> >
> >   Process name: [[52712,1],0]
> >   Exit code:    1
> >
> --------------------------------------------------------------------------
> /home/juan/OpenFOAM/juan-v2206/petsc-cuda/src/snes/tutorials
> Possible problem with ex19 running with hypre, diffs above
> =========================================
> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
> C/C++ example src/snes/tutorials/ex19 run successfully with suitesparse
> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
> Completed test examples
>
> I have compiled the code on the head node (without GPUs) and on the
> compute node where there are 4 GPUs.
>
> $nvidia-debugdump -l
> Found 4 NVIDIA devices
> Device ID:              0
> Device name:            Tesla K40m
> GPU internal ID:        0320717032250
>
> Device ID:              1
> Device name:            Tesla K40m
> GPU internal ID:        0320717031968
>
> Device ID:              2
> Device name:            Tesla K40m
> GPU internal ID:        0320717032246
>
> Device ID:              3
> Device name:            Tesla K40m
> GPU internal ID:        0320717032235
>
> Attached are the log files form configure and make.
>
> Any pointers are highly appreciated. My intention is to use PETSc as a
> linear solver for OpenFOAM, leveraging the availability of GPUs at the same
> time. Currently I can run PETSc without GPU support.
>
> Cheers,
> Juan S.
>
>
>
>
>
> <configure.log.tar.gz><make.log.tar.gz>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220714/5c37aae9/attachment-0001.html>


More information about the petsc-users mailing list