[petsc-users] 32-bit vs 64-bit GPU support
Rohan Yadav
rohany at alumni.cmu.edu
Fri Aug 11 12:23:15 CDT 2023
Hi,
I was wondering what the official status of 64-bit integer support in the
PETSc GPU backend is (specifically CUDA). This question comes from the
result of benchmarking some PETSc code and looking at some sources. In
particular, I found that PETSc's call to cuSPARSE SpMV seems to always be
using the 32-bit integer call, even if I compile PETSc with
`--with-64-bit-indices`. After digging around more, I see that PETSc always
only creates 32-bit cuSPARSE matrices as well:
https://gitlab.com/petsc/petsc/-/blob/v3.19.4/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu?ref_type=tags#L2501.
I was looking around for a switch somewhere to 64 bit integers inside this
code, but everything seems to be pretty hardcoded with `THRUSTINTARRAY32`.
As expected, this all works when the range of coordinates in each sparse
matrix partition is less than INT_MAX, but PETSc GPU code breaks in
different ways (calling cuBLAS and cuSPARSE) when trying a (synthetic)
problem that needs 64 bit integers:
```
#include "petscmat.h"
#include "petscvec.h"
#include "petsc.h"
int main(int argc, char** argv) {
PetscInt ierr;
PetscInitialize(&argc, &argv, (char *)0, "GPU bug");
PetscInt numRows = 1;
PetscInt numCols = PetscInt(INT_MAX) * 2;
Mat A;
PetscInt rowStart, rowEnd;
ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr);
MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, numRows, numCols);
MatSetType(A, MATMPIAIJ);
MatSetFromOptions(A);
MatSetValue(A, 0, 0, 1.0, INSERT_VALUES);
MatSetValue(A, 0, numCols - 1, 1.0, INSERT_VALUES);
MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
Vec b;
ierr = VecCreate(PETSC_COMM_WORLD, &b); CHKERRQ(ierr);
VecSetSizes(b, PETSC_DECIDE, numCols);
VecSetFromOptions(b);
VecSet(b, 0.0);
VecSetValue(b, 0, 42.0, INSERT_VALUES);
VecSetValue(b, numCols - 1, 58.0, INSERT_VALUES);
VecAssemblyBegin(b);
VecAssemblyEnd(b);
Vec x;
ierr = VecCreate(PETSC_COMM_WORLD, &x); CHKERRQ(ierr);
VecSetSizes(x, PETSC_DECIDE, numRows);
VecSetFromOptions(x);
VecSet(x, 0.0);
MatMult(A, b, x);
PetscScalar result;
VecSum(x, &result);
PetscPrintf(PETSC_COMM_WORLD, "Result of mult: %f\n", result);
PetscFinalize();
}
```
When this program is run on CPUs, it outputs 100.0, as expected.
When run on a single GPU with `-vec_type cuda -mat_type aijcusparse
-use_gpu_aware_mpi 0` it fails with
```
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Argument out of range
[0]PETSC ERROR: 4294967294 is too big for cuBLAS, which may be restricted
to 32-bit integers
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.19.4, unknown
[0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11
09:34:10 2023
[0]PETSC ERROR: Configure options --with-cuda=1
--prefix=/local/home/rohany/petsc/petsc-install/
--with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3
CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0
--with-64-bit-indices
[0]PETSC ERROR: #1 checkCupmBlasIntCast() at
/local/home/rohany/petsc/include/petsc/private/cupmblasinterface.hpp:435
[0]PETSC ERROR: #2 VecAllocateCheck_() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:335
[0]PETSC ERROR: #3 VecCUPMAllocateCheck_() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:360
[0]PETSC ERROR: #4 DeviceAllocateCheck_() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:389
[0]PETSC ERROR: #5 GetArray() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:545
[0]PETSC ERROR: #6 VectorArray() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:273
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
with errorcode 63.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
```
and when run with just `-mat_type aijcusparse -use_gpu_aware_mpi 0` it
fails with
```
** On entry to cusparseCreateCsr(): dimension mismatch for
CUSPARSE_INDEX_32I, cols (4294967294) + base (0) > INT32_MAX (2147483647)
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: GPU error
[0]PETSC ERROR: cuSPARSE errorcode 3 (CUSPARSE_STATUS_INVALID_VALUE) :
invalid value
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.19.4, unknown
[0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11
09:43:07 2023
[0]PETSC ERROR: Configure options --with-cuda=1
--prefix=/local/home/rohany/petsc/petsc-install/
--with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3
CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0
--with-64-bit-indices
[0]PETSC ERROR: #1 MatSeqAIJCUSPARSECopyToGPU() at
/local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/
aijcusparse.cu:2503
[0]PETSC ERROR: #2 MatMultAddKernel_SeqAIJCUSPARSE() at
/local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/
aijcusparse.cu:3544
[0]PETSC ERROR: #3 MatMult_SeqAIJCUSPARSE() at
/local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/
aijcusparse.cu:3485
[0]PETSC ERROR: #4 MatMult_MPIAIJCUSPARSE() at
/local/home/rohany/petsc/src/mat/impls/aij/mpi/mpicusparse/
mpiaijcusparse.cu:452
[0]PETSC ERROR: #5 MatMult() at
/local/home/rohany/petsc/src/mat/interface/matrix.c:2599
```
Thanks,
Rohan Yadav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230811/00437b57/attachment.html>
More information about the petsc-users
mailing list