[petsc-users] 32-bit vs 64-bit GPU support

Rohan Yadav rohany at alumni.cmu.edu
Fri Aug 11 12:23:15 CDT 2023


Hi,

I was wondering what the official status of 64-bit integer support in the
PETSc GPU backend is (specifically CUDA). This question comes from the
result of benchmarking some PETSc code and looking at some sources. In
particular, I found that PETSc's call to cuSPARSE SpMV seems to always be
using the 32-bit integer call, even if I compile PETSc with
`--with-64-bit-indices`. After digging around more, I see that PETSc always
only creates 32-bit cuSPARSE matrices as well:
https://gitlab.com/petsc/petsc/-/blob/v3.19.4/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu?ref_type=tags#L2501.
I was looking around for a switch somewhere to 64 bit integers inside this
code, but everything seems to be pretty hardcoded with `THRUSTINTARRAY32`.

As expected, this all works when the range of coordinates in each sparse
matrix partition is less than INT_MAX, but PETSc GPU code breaks in
different ways (calling cuBLAS and cuSPARSE) when trying a (synthetic)
problem that needs 64 bit integers:

```
#include "petscmat.h"
#include "petscvec.h"
#include "petsc.h"

int main(int argc, char** argv) {
  PetscInt ierr;
  PetscInitialize(&argc, &argv, (char *)0, "GPU bug");

  PetscInt numRows = 1;
  PetscInt numCols = PetscInt(INT_MAX) * 2;

  Mat A;
  PetscInt rowStart, rowEnd;
  ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr);
  MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, numRows, numCols);
  MatSetType(A, MATMPIAIJ);
  MatSetFromOptions(A);

  MatSetValue(A, 0, 0, 1.0, INSERT_VALUES);
  MatSetValue(A, 0, numCols - 1, 1.0, INSERT_VALUES);
  MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
  MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);

  Vec b;
  ierr = VecCreate(PETSC_COMM_WORLD, &b); CHKERRQ(ierr);
  VecSetSizes(b, PETSC_DECIDE, numCols);
  VecSetFromOptions(b);
  VecSet(b, 0.0);
  VecSetValue(b, 0, 42.0, INSERT_VALUES);
  VecSetValue(b, numCols - 1, 58.0, INSERT_VALUES);
  VecAssemblyBegin(b);
  VecAssemblyEnd(b);

  Vec x;
  ierr = VecCreate(PETSC_COMM_WORLD, &x); CHKERRQ(ierr);
  VecSetSizes(x, PETSC_DECIDE, numRows);
  VecSetFromOptions(x);
  VecSet(x, 0.0);

  MatMult(A, b, x);
  PetscScalar result;
  VecSum(x, &result);
  PetscPrintf(PETSC_COMM_WORLD, "Result of mult: %f\n", result);
  PetscFinalize();
}
```

When this program is run on CPUs, it outputs 100.0, as expected.

When run on a single GPU with `-vec_type cuda -mat_type aijcusparse
-use_gpu_aware_mpi 0` it fails with
```

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------

[0]PETSC ERROR: Argument out of range

[0]PETSC ERROR: 4294967294 is too big for cuBLAS, which may be restricted
to 32-bit integers

[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.19.4, unknown

[0]PETSC ERROR: ./gpu-bug on a  named sean-dgx2 by rohany Fri Aug 11
09:34:10 2023

[0]PETSC ERROR: Configure options --with-cuda=1
--prefix=/local/home/rohany/petsc/petsc-install/
--with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3
CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0
--with-64-bit-indices

[0]PETSC ERROR: #1 checkCupmBlasIntCast() at
/local/home/rohany/petsc/include/petsc/private/cupmblasinterface.hpp:435

[0]PETSC ERROR: #2 VecAllocateCheck_() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:335

[0]PETSC ERROR: #3 VecCUPMAllocateCheck_() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:360

[0]PETSC ERROR: #4 DeviceAllocateCheck_() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:389

[0]PETSC ERROR: #5 GetArray() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:545

[0]PETSC ERROR: #6 VectorArray() at
/local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:273

--------------------------------------------------------------------------

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF

with errorcode 63.


NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--------------------------------------------------------------------------

```


and when run with just `-mat_type aijcusparse -use_gpu_aware_mpi 0` it
fails with

```

 ** On entry to cusparseCreateCsr(): dimension mismatch for
CUSPARSE_INDEX_32I, cols (4294967294) + base (0) > INT32_MAX (2147483647)


[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------

[0]PETSC ERROR: GPU error

[0]PETSC ERROR: cuSPARSE errorcode 3 (CUSPARSE_STATUS_INVALID_VALUE) :
invalid value

[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.19.4, unknown

[0]PETSC ERROR: ./gpu-bug on a  named sean-dgx2 by rohany Fri Aug 11
09:43:07 2023

[0]PETSC ERROR: Configure options --with-cuda=1
--prefix=/local/home/rohany/petsc/petsc-install/
--with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3
CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0
--with-64-bit-indices

[0]PETSC ERROR: #1 MatSeqAIJCUSPARSECopyToGPU() at
/local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/
aijcusparse.cu:2503

[0]PETSC ERROR: #2 MatMultAddKernel_SeqAIJCUSPARSE() at
/local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/
aijcusparse.cu:3544

[0]PETSC ERROR: #3 MatMult_SeqAIJCUSPARSE() at
/local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/
aijcusparse.cu:3485

[0]PETSC ERROR: #4 MatMult_MPIAIJCUSPARSE() at
/local/home/rohany/petsc/src/mat/impls/aij/mpi/mpicusparse/
mpiaijcusparse.cu:452

[0]PETSC ERROR: #5 MatMult() at
/local/home/rohany/petsc/src/mat/interface/matrix.c:2599

```


Thanks,


Rohan Yadav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230811/00437b57/attachment.html>


More information about the petsc-users mailing list