[petsc-users] 32-bit vs 64-bit GPU support

Fri Aug 11 14:24:43 CDT 2023

Rohan,
  You could try the petsc/kokkos backend.  I have not tested it, but I
guess it should handle 64 bit CUDA index types.
  I guess the petsc/cuda 32-bit limit came from old CUDA versions where
only 32-bit indices were supported such that the original developers
hardwired the type to THRUSTINTARRAY32.  We try to support generations of
cuda toolkits and thus have the current code.

  Anyway, this should be fixed.
--Junchao Zhang

On Fri, Aug 11, 2023 at 1:07 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>    We do not currently have any code for using 64 bit integer sizes on the
> GPUs.
>
>    Given the current memory available on GPUs is 64 bit integer support
> needed? I think even a single vector of length 2^31 will use up most of the
> GPU's memory? Are the practical, not synthetic, situations that require 64
> bit integer support on GPUs immediately?  For example, is the vector length
> of the entire parallel vector across all GPUs limited to 32 bits?
>
>    We will certainly add such support, but it is a question of priorities;
> there are many things we need to do to improve PETSc GPU support, and they
> take time. Unless we have practical use cases, 64 bit integer support for
> integer sizes on the GPU is not at the top of the list. Of course, we would
> be very happy with a merge request that would provide this support at any
> time.
>
>   Barry
>
>
>
> On Aug 11, 2023, at 1:23 PM, Rohan Yadav <rohany at alumni.cmu.edu> wrote:
>
> Hi,
>
> I was wondering what the official status of 64-bit integer support in the
> PETSc GPU backend is (specifically CUDA). This question comes from the
> result of benchmarking some PETSc code and looking at some sources. In
> particular, I found that PETSc's call to cuSPARSE SpMV seems to always be
> using the 32-bit integer call, even if I compile PETSc with
> `--with-64-bit-indices`. After digging around more, I see that PETSc always
> only creates 32-bit cuSPARSE matrices as well:
> https://gitlab.com/petsc/petsc/-/blob/v3.19.4/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu?ref_type=tags#L2501.
> I was looking around for a switch somewhere to 64 bit integers inside this
> code, but everything seems to be pretty hardcoded with `THRUSTINTARRAY32`.
>
> As expected, this all works when the range of coordinates in each sparse
> matrix partition is less than INT_MAX, but PETSc GPU code breaks in
> different ways (calling cuBLAS and cuSPARSE) when trying a (synthetic)
> problem that needs 64 bit integers:
>
> ```
> #include "petscmat.h"
> #include "petscvec.h"
> #include "petsc.h"
>
> int main(int argc, char** argv) {
>   PetscInt ierr;
>   PetscInitialize(&argc, &argv, (char *)0, "GPU bug");
>
>   PetscInt numRows = 1;
>   PetscInt numCols = PetscInt(INT_MAX) * 2;
>
>   Mat A;
>   PetscInt rowStart, rowEnd;
>   ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr);
>   MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, numRows, numCols);
>   MatSetType(A, MATMPIAIJ);
>   MatSetFromOptions(A);
>
>   MatSetValue(A, 0, 0, 1.0, INSERT_VALUES);
>   MatSetValue(A, 0, numCols - 1, 1.0, INSERT_VALUES);
>   MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>   MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>
>   Vec b;
>   ierr = VecCreate(PETSC_COMM_WORLD, &b); CHKERRQ(ierr);
>   VecSetSizes(b, PETSC_DECIDE, numCols);
>   VecSetFromOptions(b);
>   VecSet(b, 0.0);
>   VecSetValue(b, 0, 42.0, INSERT_VALUES);
>   VecSetValue(b, numCols - 1, 58.0, INSERT_VALUES);
>   VecAssemblyBegin(b);
>   VecAssemblyEnd(b);
>
>   Vec x;
>   ierr = VecCreate(PETSC_COMM_WORLD, &x); CHKERRQ(ierr);
>   VecSetSizes(x, PETSC_DECIDE, numRows);
>   VecSetFromOptions(x);
>   VecSet(x, 0.0);
>
>   MatMult(A, b, x);
>   PetscScalar result;
>   VecSum(x, &result);
>   PetscPrintf(PETSC_COMM_WORLD, "Result of mult: %f\n", result);
>   PetscFinalize();
> }
> ```
>
> When this program is run on CPUs, it outputs 100.0, as expected.
>
> When run on a single GPU with `-vec_type cuda -mat_type aijcusparse
> -use_gpu_aware_mpi 0` it fails with
> ```
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Argument out of range
> [0]PETSC ERROR: 4294967294 is too big for cuBLAS, which may be restricted
> to 32-bit integers
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown
> [0]PETSC ERROR: ./gpu-bug on a  named sean-dgx2 by rohany Fri Aug 11
> 09:34:10 2023
> [0]PETSC ERROR: Configure options --with-cuda=1
> --prefix=/local/home/rohany/petsc/petsc-install/
> --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3
> CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0
> --with-64-bit-indices
> [0]PETSC ERROR: #1 checkCupmBlasIntCast() at
> /local/home/rohany/petsc/include/petsc/private/cupmblasinterface.hpp:435
> [0]PETSC ERROR: #2 VecAllocateCheck_() at
> /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:335
> [0]PETSC ERROR: #3 VecCUPMAllocateCheck_() at
> /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:360
> [0]PETSC ERROR: #4 DeviceAllocateCheck_() at
> /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:389
> [0]PETSC ERROR: #5 GetArray() at
> /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:545
> [0]PETSC ERROR: #6 VectorArray() at
> /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:273
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
> with errorcode 63.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> ```
>
> and when run with just `-mat_type aijcusparse -use_gpu_aware_mpi 0` it
> fails with
> ```
>  ** On entry to cusparseCreateCsr(): dimension mismatch for
> CUSPARSE_INDEX_32I, cols (4294967294) + base (0) > INT32_MAX (2147483647)
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: cuSPARSE errorcode 3 (CUSPARSE_STATUS_INVALID_VALUE) :
> invalid value
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown
> [0]PETSC ERROR: ./gpu-bug on a  named sean-dgx2 by rohany Fri Aug 11
> 09:43:07 2023
> [0]PETSC ERROR: Configure options --with-cuda=1
> --prefix=/local/home/rohany/petsc/petsc-install/
> --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3
> CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0
> --with-64-bit-indices
> [0]PETSC ERROR: #1 MatSeqAIJCUSPARSECopyToGPU() at
> /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:2503
> [0]PETSC ERROR: #2 MatMultAddKernel_SeqAIJCUSPARSE() at
> /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:3544
> [0]PETSC ERROR: #3 MatMult_SeqAIJCUSPARSE() at
> /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:3485
> [0]PETSC ERROR: #4 MatMult_MPIAIJCUSPARSE() at
> /local/home/rohany/petsc/src/mat/impls/aij/mpi/mpicusparse/
> mpiaijcusparse.cu:452
>
> [0]PETSC ERROR: #5 MatMult() at
> /local/home/rohany/petsc/src/mat/interface/matrix.c:2599
> ```
>
> Thanks,
>
> Rohan Yadav
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230811/523d7ce8/attachment.html>