[petsc-dev] -with-kokkos-cuda-arch=AMPERE80 nonsense

Mon Apr 5 19:32:56 CDT 2021

NVCC has supported multi-versioned "fat" binaries since I worked for
Argonne.  Libraries should figure out what the oldest hardware they are
about is and then compile for everything from that point forward.  Kepler
(3.5) is oldest version any reasonable person should be thinking about at
this point.  The oldest thing I know of in the DOE HPC fleet is Pascal
(6.x).  Volta and Turing are 7.x and Ampere is 8.x.

The biggest architectural changes came with unified memory (
https://developer.nvidia.com/blog/unified-memory-in-cuda-6/) and
cooperative (https://developer.nvidia.com/blog/cooperative-groups/ in CUDA
9) but Kokkos doesn't use the latter.  Both features can be used on quite
old GPU architectures, although the performance is better on newer ones.

I haven't dug into what Kokkos and PETSc are doing but the direct use of
this stuff in CUDA is well-documented, certainly as well as the CPU
switches for x86 binaries in the Intel compiler are.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities

Devices with the same major revision number are of the same core
architecture. The major revision number is 8 for devices based on the NVIDIA
Ampere GPU architecture, 7 for devices based on the Volta architecture, 6
for devices based on the Pascal architecture, 5 for devices based on the
Maxwell architecture, 3 for devices based on the Kepler architecture, 2 for
devices based on the Fermi architecture, and 1 for devices based on the
Tesla architecture.

https://docs.nvidia.com/cuda/pascal-compatibility-guide/index.html#building-pascal-compatible-apps-using-cuda-8-0
https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html#building-volta-compatible-apps-using-cuda-9-0
https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#building-turing-compatible-apps-using-cuda-10-0
https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#building-ampere-compatible-apps-using-cuda-11-0

Programmatic querying can be done with the following (
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html):

cudaDeviceGetAttribute

   -

   cudaDevAttrComputeCapabilityMajor
   <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd220ff111a6616ab512e229d8f2f8bf87>:
   Major compute capability version number;
   -

   cudaDevAttrComputeCapabilityMinor
   <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd2c981c76c9de58d39502e483a7b484c7>:
   Minor compute capability version number;

The compiler help tells me this, which can be cross-referenced with CUDA
documentation above.

$ /usr/local/cuda-10.0/bin/nvcc -h

Usage  : nvcc [options] <inputfile>

...

Options for steering GPU code generation.

=========================================

--gpu-architecture <arch>                  (-arch)

        Specify the name of the class of NVIDIA 'virtual' GPU architecture
for which

        the CUDA input files must be compiled.

        With the exception as described for the shorthand below, the
architecture

        specified with this option must be a 'virtual' architecture (such
as compute_50).

        Normally, this option alone does not trigger assembly of the
generated PTX

        for a 'real' architecture (that is the role of nvcc option
'--gpu-code',

        see below); rather, its purpose is to control preprocessing and
compilation

        of the input to PTX.

        For convenience, in case of simple nvcc compilations, the following
shorthand

        is supported.  If no value for option '--gpu-code' is specified,
then the

        value of this option defaults to the value of '--gpu-architecture'.
In this

        situation, as only exception to the description above, the value
specified

        for '--gpu-architecture' may be a 'real' architecture (such as a
sm_50),

        in which case nvcc uses the specified 'real' architecture and its
closest

        'virtual' architecture as effective architecture values.  For
example, 'nvcc

        --gpu-architecture=sm_50' is equivalent to 'nvcc
--gpu-architecture=compute_50

        --gpu-code=sm_50,compute_50'.

        Allowed values for this option:
'compute_30','compute_32','compute_35',

'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',

'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',

'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72',

        'sm_75'.

--gpu-code <code>,...                      (-code)

        Specify the name of the NVIDIA GPU to assemble and optimize PTX for.

        nvcc embeds a compiled code image in the resulting executable for
each specified

        <code> architecture, which is a true binary load image for each
'real' architecture

        (such as sm_50), and PTX code for the 'virtual' architecture (such
as compute_50).

        During runtime, such embedded PTX code is dynamically compiled by
the CUDA

        runtime system if no binary load image is found for the 'current'
GPU.

        Architectures specified for options '--gpu-architecture' and
'--gpu-code'

        may be 'virtual' as well as 'real', but the <code> architectures
must be

        compatible with the <arch> architecture.  When the '--gpu-code'
option is

        used, the value for the '--gpu-architecture' option must be a
'virtual' PTX

        architecture.

        For instance, '--gpu-architecture=compute_35' is not compatible
with '--gpu-code=sm_30',

        because the earlier compilation stages will assume the availability
of 'compute_35'

        features that are not present on 'sm_30'.

        Allowed values for this option:
'compute_30','compute_32','compute_35',

'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',

'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',

'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72',

        'sm_75'.

--generate-code <specification>,...        (-gencode)

        This option provides a generalization of the
'--gpu-architecture=<arch> --gpu-code=<code>,

        ...' option combination for specifying nvcc behavior with respect
to code

        generation.  Where use of the previous options generates code for
different

        'real' architectures with the PTX for the same 'virtual'
architecture, option

        '--generate-code' allows multiple PTX generations for different
'virtual'

        architectures.  In fact, '--gpu-architecture=<arch>
--gpu-code=<code>,

        ...' is equivalent to '--generate-code arch=<arch>,code=<code>,...'.

        '--generate-code' options may be repeated for different virtual
architectures.

        Allowed keywords for this option:  'arch','code'.

On Mon, Apr 5, 2021 at 1:19 PM Satish Balay via petsc-dev <
petsc-dev at mcs.anl.gov> wrote:

> This is nvidia mess-up. Why isn't there a command that give me these
> values [if they insist on this interface for nvcc]
>
> I see Barry want configure to do something here - but whatever we do - we
> would be shifting the problem around.
> [even if we detect stuff - build box might not have the GPU used for runs.]
>
> We have --with-cuda-arch - which I tried to remove from configure - but
> its come back in a different form (--with-cuda-gencodearch)
>
> And I see other packages:
>
>   --with-kokkos-cuda-arch
>
> Wrt spack - I'm having to do:
>
> spack install xsdk+cuda ^magma cuda_arch=60
>
> [magma uses CudaPackage() infrastructure in spack]
>
> Satish
>
> On Mon, 5 Apr 2021, Mills, Richard Tran via petsc-dev wrote:
>
> > You raise a good point, Barry. I've been completely mystified by what
> some of these names even mean. What does "PASCAL60" vs. "PASCAL61" even
> mean? Do you know of where this is even documented? I can't really find
> anything about it in the Kokkos documentation. The only thing I can really
> find is an issue or two about "hey, shouldn't our CMake stuff figure this
> out automatically" and then some posts about why it can't really do that.
> Not encouraging.
> >
> > --Richard
> >
> > On 4/3/21 8:42 PM, Barry Smith wrote:
> >
> >
> >   It would be very nice to NOT require PETSc users to provide this flag,
> how the heck will they know what it should be when we cannot automate it
> ourselves?
> >
> >   Any ideas of how this can be determined based on the current system?
> NVIDIA does not help since these "advertising" names don't seem to
> trivially map to information you can get from a particular GPU when you
> logged into it. For example nvidia-smi doesn't use these names directly. Is
> there some mapping from nvidia-smi  to these names we could use? If we are
> serious about having a non-trivial number of users utilizing GPUs, which we
> need to be for future, we cannot have this absurd demands in our
> installation process.
> >
> >   Barry
> >
> > Does spack have some magic for this we could use?
> >
> >
> >
> >
>
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210405/8ddb7aae/attachment-0001.html>