[petsc-users] Error running src/snes/tutorials/ex19 on Nvidia Tesla K40m : CUDA ERROR (code = 101, invalid device ordinal)
Barry Smith
bsmith at petsc.dev
Thu Jul 14 11:54:51 CDT 2022
So the PETSc test all run, including the test that uses a GPU.
The hypre test is failing. It is impossible to tell from the output why.
You can run it manually, cd src/snes/tutorials
make ex19
mpiexec -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -da_refine 3 -snes_monitor_short -ksp_norm_type unpreconditioned -pc_type hypre -info > somefile
then take a look at the output in somefile and send it to us.
Barry
> On Jul 14, 2022, at 12:32 PM, Juan Pablo de Lima Costa Salazar via petsc-users <petsc-users at mcs.anl.gov> wrote:
>
> Hello,
>
> I was hoping to get help regarding a runtime error I am encountering on a cluster node with 4 Tesla K40m GPUs after configuring PETSc with the following command:
>
> $./configure --force \
> --with-precision=double \
> --with-debugging=0 \
> --COPTFLAGS=-O3 \
> --CXXOPTFLAGS=-O3 \
> --FOPTFLAGS=-O3 \
> PETSC_ARCH=linux64GccDPInt32-spack \
> --download-fblaslapack \
> --download-openblas \
> --download-hypre \
> --download-hypre-configure-arguments=--enable-unified-memory \
> --with-mpi-dir=/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.4 \
> --with-cuda=1 \
> --download-suitesparse \
> --download-dir=downloads \
> --with-cudac=/opt/ohpc/admin/spack/0.15.0/opt/spack/linux-centos8-ivybridge/gcc-9.3.0/cuda-11.7.0-hel25vgwc7fixnvfl5ipvnh34fnskw3m/bin/nvcc \
> --with-packages-download-dir=downloads \
> --download-sowing=downloads/v1.1.26-p4.tar.gz \
> --with-cuda-arch=35
>
> When I run
>
> $ make PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda PETSC_ARCH=linux64GccDPInt32-spack check
> Running check examples to verify correct installation
> Using PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda and PETSC_ARCH=linux64GccDPInt32-spack
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> 3,5c3,15
> < 1 SNES Function norm 4.12227e-06
> < 2 SNES Function norm 6.098e-11
> < Number of SNES iterations = 2
> ---
> > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> > --------------------------------------------------------------------------
> > Primary job terminated normally, but 1 process returned
> > a non-zero exit code. Per user-direction, the job has been aborted.
> > --------------------------------------------------------------------------
> > --------------------------------------------------------------------------
> > mpiexec detected that one or more processes exited with non-zero status, thus causing
> > the job to be terminated. The first process to do so was:
> >
> > Process name: [[52712,1],0]
> > Exit code: 1
> > --------------------------------------------------------------------------
> /home/juan/OpenFOAM/juan-v2206/petsc-cuda/src/snes/tutorials
> Possible problem with ex19 running with hypre, diffs above
> =========================================
> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
> C/C++ example src/snes/tutorials/ex19 run successfully with suitesparse
> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
> Completed test examples
>
> I have compiled the code on the head node (without GPUs) and on the compute node where there are 4 GPUs.
>
> $nvidia-debugdump -l
> Found 4 NVIDIA devices
> Device ID: 0
> Device name: Tesla K40m
> GPU internal ID: 0320717032250
>
> Device ID: 1
> Device name: Tesla K40m
> GPU internal ID: 0320717031968
>
> Device ID: 2
> Device name: Tesla K40m
> GPU internal ID: 0320717032246
>
> Device ID: 3
> Device name: Tesla K40m
> GPU internal ID: 0320717032235
>
> Attached are the log files form configure and make.
>
> Any pointers are highly appreciated. My intention is to use PETSc as a linear solver for OpenFOAM, leveraging the availability of GPUs at the same time. Currently I can run PETSc without GPU support.
>
> Cheers,
> Juan S.
>
>
>
>
>
> <configure.log.tar.gz><make.log.tar.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220714/ee5ccdf6/attachment.html>
More information about the petsc-users
mailing list