[petsc-users] 32-bit vs 64-bit GPU support

Fri Aug 11 14:59:51 CDT 2023

> We should support it, but it still seems hypothetical and not urgent.

FWIW, cuBLAS only just added 64-bit int support with CUDA 12 (naturally, with a completely separate API). 

More generally, it would be interesting to know the breakdown of installed CUDA versions for users. Unlike compilers etc, I suspect that cluster admins (and those running on local machines) are much more likely to be updating their CUDA toolkits to the latest versions as they often contain critical performance improvements.

It would help us decide on minimum version to support. We don’t have any real idea of the current minimum version, last time it was estimated to be CUDA 7 IIRC?

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

> On Aug 11, 2023, at 15:38, Jed Brown <jed at jedbrown.org> wrote:
> 
> Rohan Yadav <rohany at alumni.cmu.edu> writes:
> 
>> With modern GPU sizes, for example A100's with 80GB of memory, a vector of
>> length 2^31 is not that much memory -- one could conceivably run a CG solve
>> with local vectors > 2^31.
> 
> Yeah, each vector would be 8 GB (single precision) or 16 GB (double). You can't store a matrix of this size, and probably not a "mesh", but it's possible to create such a problem if everything is matrix-free (possibly with matrix-free geometric multigrid). This is more likely to show up in a benchmark than any real science or engineering probelm. We should support it, but it still seems hypothetical and not urgent.
> 
>> Thanks Junchao, I might look into that. However, I currently am not trying
>> to solve such a large problem -- these questions just came from wondering
>> why the cuSPARSE kernel PETSc was calling was running faster than mine.
> 
> Hah, bandwidth doesn't like. ;-)