[petsc-users] 32-bit vs 64-bit GPU support

Fri Aug 11 13:06:58 CDT 2023

   We do not currently have any code for using 64 bit integer sizes on the GPUs. 

   Given the current memory available on GPUs is 64 bit integer support needed? I think even a single vector of length 2^31 will use up most of the GPU's memory? Are the practical, not synthetic, situations that require 64 bit integer support on GPUs immediately?  For example, is the vector length of the entire parallel vector across all GPUs limited to 32 bits? 

   We will certainly add such support, but it is a question of priorities; there are many things we need to do to improve PETSc GPU support, and they take time. Unless we have practical use cases, 64 bit integer support for integer sizes on the GPU is not at the top of the list. Of course, we would be very happy with a merge request that would provide this support at any time.

  Barry

> On Aug 11, 2023, at 1:23 PM, Rohan Yadav <rohany at alumni.cmu.edu> wrote:
> 
> Hi,
> 
> I was wondering what the official status of 64-bit integer support in the PETSc GPU backend is (specifically CUDA). This question comes from the result of benchmarking some PETSc code and looking at some sources. In particular, I found that PETSc's call to cuSPARSE SpMV seems to always be using the 32-bit integer call, even if I compile PETSc with `--with-64-bit-indices`. After digging around more, I see that PETSc always only creates 32-bit cuSPARSE matrices as well: https://gitlab.com/petsc/petsc/-/blob/v3.19.4/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu?ref_type=tags#L2501. I was looking around for a switch somewhere to 64 bit integers inside this code, but everything seems to be pretty hardcoded with `THRUSTINTARRAY32`.
> 
> As expected, this all works when the range of coordinates in each sparse matrix partition is less than INT_MAX, but PETSc GPU code breaks in different ways (calling cuBLAS and cuSPARSE) when trying a (synthetic) problem that needs 64 bit integers:
> 
> ```
> #include "petscmat.h"
> #include "petscvec.h"
> #include "petsc.h"
> 
> int main(int argc, char** argv) {
>   PetscInt ierr;
>   PetscInitialize(&argc, &argv, (char *)0, "GPU bug");
> 
>   PetscInt numRows = 1;
>   PetscInt numCols = PetscInt(INT_MAX) * 2;
> 
>   Mat A;
>   PetscInt rowStart, rowEnd;
>   ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr);
>   MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, numRows, numCols);
>   MatSetType(A, MATMPIAIJ);
>   MatSetFromOptions(A);
> 
>   MatSetValue(A, 0, 0, 1.0, INSERT_VALUES);
>   MatSetValue(A, 0, numCols - 1, 1.0, INSERT_VALUES);
>   MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>   MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
> 
>   Vec b;
>   ierr = VecCreate(PETSC_COMM_WORLD, &b); CHKERRQ(ierr);
>   VecSetSizes(b, PETSC_DECIDE, numCols);
>   VecSetFromOptions(b);
>   VecSet(b, 0.0);
>   VecSetValue(b, 0, 42.0, INSERT_VALUES);
>   VecSetValue(b, numCols - 1, 58.0, INSERT_VALUES);
>   VecAssemblyBegin(b);
>   VecAssemblyEnd(b);
> 
>   Vec x;
>   ierr = VecCreate(PETSC_COMM_WORLD, &x); CHKERRQ(ierr);
>   VecSetSizes(x, PETSC_DECIDE, numRows);
>   VecSetFromOptions(x);
>   VecSet(x, 0.0);
> 
>   MatMult(A, b, x);
>   PetscScalar result;
>   VecSum(x, &result);
>   PetscPrintf(PETSC_COMM_WORLD, "Result of mult: %f\n", result);
>   PetscFinalize();
> }
> ```
> 
> When this program is run on CPUs, it outputs 100.0, as expected.
> 
> When run on a single GPU with `-vec_type cuda -mat_type aijcusparse -use_gpu_aware_mpi 0` it fails with
> ```
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Argument out of range
> [0]PETSC ERROR: 4294967294 is too big for cuBLAS, which may be restricted to 32-bit integers
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown
> [0]PETSC ERROR: ./gpu-bug on a  named sean-dgx2 by rohany Fri Aug 11 09:34:10 2023
> [0]PETSC ERROR: Configure options --with-cuda=1 --prefix=/local/home/rohany/petsc/petsc-install/ --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 --with-64-bit-indices
> [0]PETSC ERROR: #1 checkCupmBlasIntCast() at /local/home/rohany/petsc/include/petsc/private/cupmblasinterface.hpp:435
> [0]PETSC ERROR: #2 VecAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:335
> [0]PETSC ERROR: #3 VecCUPMAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:360
> [0]PETSC ERROR: #4 DeviceAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:389
> [0]PETSC ERROR: #5 GetArray() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:545
> [0]PETSC ERROR: #6 VectorArray() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:273
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
> with errorcode 63.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> ```
> 
> and when run with just `-mat_type aijcusparse -use_gpu_aware_mpi 0` it fails with
> ```
>  ** On entry to cusparseCreateCsr(): dimension mismatch for CUSPARSE_INDEX_32I, cols (4294967294) + base (0) > INT32_MAX (2147483647)
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: cuSPARSE errorcode 3 (CUSPARSE_STATUS_INVALID_VALUE) : invalid value
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown
> [0]PETSC ERROR: ./gpu-bug on a  named sean-dgx2 by rohany Fri Aug 11 09:43:07 2023
> [0]PETSC ERROR: Configure options --with-cuda=1 --prefix=/local/home/rohany/petsc/petsc-install/ --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 --with-64-bit-indices
> [0]PETSC ERROR: #1 MatSeqAIJCUSPARSECopyToGPU() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2503 <http://aijcusparse.cu:2503/>
> [0]PETSC ERROR: #2 MatMultAddKernel_SeqAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:3544 <http://aijcusparse.cu:3544/>
> [0]PETSC ERROR: #3 MatMult_SeqAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:3485 <http://aijcusparse.cu:3485/>
> [0]PETSC ERROR: #4 MatMult_MPIAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:452 <http://mpiaijcusparse.cu:452/>
> [0]PETSC ERROR: #5 MatMult() at /local/home/rohany/petsc/src/mat/interface/matrix.c:2599
> ```
> 
> Thanks,
> 
> Rohan Yadav
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230811/df435461/attachment.html>