[petsc-dev] -with-kokkos-cuda-arch=AMPERE80 nonsense

Mon Apr 5 21:30:34 CDT 2021

On Mon, Apr 5, 2021 at 7:33 PM Jeff Hammond <jeff.science at gmail.com> wrote:

> NVCC has supported multi-versioned "fat" binaries since I worked for
> Argonne.  Libraries should figure out what the oldest hardware they are
> about is and then compile for everything from that point forward.  Kepler
> (3.5) is oldest version any reasonable person should be thinking about at
> this point.  The oldest thing I know of in the DOE HPC fleet is Pascal
> (6.x).  Volta and Turing are 7.x and Ampere is 8.x.
>
> The biggest architectural changes came with unified memory (
> https://developer.nvidia.com/blog/unified-memory-in-cuda-6/) and
> cooperative (https://developer.nvidia.com/blog/cooperative-groups/ in
> CUDA 9) but Kokkos doesn't use the latter.  Both features can be used on
> quite old GPU architectures, although the performance is better on newer
> ones.
>
> I haven't dug into what Kokkos and PETSc are doing but the direct use of
> this stuff in CUDA is well-documented, certainly as well as the CPU
> switches for x86 binaries in the Intel compiler are.
>
>
> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
>
> Devices with the same major revision number are of the same core
> architecture. The major revision number is 8 for devices based on the NVIDIA
> Ampere GPU architecture, 7 for devices based on the Volta architecture, 6
> for devices based on the Pascal architecture, 5 for devices based on the
> Maxwell architecture, 3 for devices based on the Kepler architecture, 2
> for devices based on the Fermi architecture, and 1 for devices based on
> the Tesla architecture.
>
Kokkos has config options Kokkos_ARCH_TURING75,
Kokkos_ARCH_VOLTA70, Kokkos_ARCH_VOLTA72.    Any idea how one can map
compute capability versions to arch names?

>
>
>
> https://docs.nvidia.com/cuda/pascal-compatibility-guide/index.html#building-pascal-compatible-apps-using-cuda-8-0
>
> https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html#building-volta-compatible-apps-using-cuda-9-0
>
> https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#building-turing-compatible-apps-using-cuda-10-0
>
> https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#building-ampere-compatible-apps-using-cuda-11-0
>
> Programmatic querying can be done with the following (
> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html):
>
> cudaDeviceGetAttribute
>
>    -
>
>    cudaDevAttrComputeCapabilityMajor
>    <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd220ff111a6616ab512e229d8f2f8bf87>:
>    Major compute capability version number;
>    -
>
>    cudaDevAttrComputeCapabilityMinor
>    <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd2c981c76c9de58d39502e483a7b484c7>:
>    Minor compute capability version number;
>
> The compiler help tells me this, which can be cross-referenced with CUDA
> documentation above.
>
> $ /usr/local/cuda-10.0/bin/nvcc -h
>
>
> Usage  : nvcc [options] <inputfile>
>
>
> ...
>
>
> Options for steering GPU code generation.
>
> =========================================
>
>
> --gpu-architecture <arch>                  (-arch)
>
>
>         Specify the name of the class of NVIDIA 'virtual' GPU
> architecture for which
>
>         the CUDA input files must be compiled.
>
>         With the exception as described for the shorthand below, the
> architecture
>
>         specified with this option must be a 'virtual' architecture (such
> as compute_50).
>
>         Normally, this option alone does not trigger assembly of the
> generated PTX
>
>         for a 'real' architecture (that is the role of nvcc option
> '--gpu-code',
>
>         see below); rather, its purpose is to control preprocessing and
> compilation
>
>         of the input to PTX.
>
>         For convenience, in case of simple nvcc compilations, the
> following shorthand
>
>         is supported.  If no value for option '--gpu-code' is specified,
> then the
>
>         value of this option defaults to the value of
> '--gpu-architecture'.  In this
>
>         situation, as only exception to the description above, the value
> specified
>
>         for '--gpu-architecture' may be a 'real' architecture (such as a
> sm_50),
>
>         in which case nvcc uses the specified 'real' architecture and its
> closest
>
>         'virtual' architecture as effective architecture values.  For
> example, 'nvcc
>
>         --gpu-architecture=sm_50' is equivalent to 'nvcc
> --gpu-architecture=compute_50
>
>         --gpu-code=sm_50,compute_50'.
>
>         Allowed values for this option:
> 'compute_30','compute_32','compute_35',
>
>
> 'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
>
>
> 'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',
>
>
> 'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72',
>
>         'sm_75'.
>
>
> --gpu-code <code>,...                      (-code)
>
>
>         Specify the name of the NVIDIA GPU to assemble and optimize PTX
> for.
>
>         nvcc embeds a compiled code image in the resulting executable for
> each specified
>
>         <code> architecture, which is a true binary load image for each
> 'real' architecture
>
>         (such as sm_50), and PTX code for the 'virtual' architecture
> (such as compute_50).
>
>         During runtime, such embedded PTX code is dynamically compiled by
> the CUDA
>
>         runtime system if no binary load image is found for the 'current'
> GPU.
>
>         Architectures specified for options '--gpu-architecture' and
> '--gpu-code'
>
>         may be 'virtual' as well as 'real', but the <code> architectures
> must be
>
>         compatible with the <arch> architecture.  When the '--gpu-code'
> option is
>
>         used, the value for the '--gpu-architecture' option must be a
> 'virtual' PTX
>
>         architecture.
>
>         For instance, '--gpu-architecture=compute_35' is not compatible
> with '--gpu-code=sm_30',
>
>         because the earlier compilation stages will assume the
> availability of 'compute_35'
>
>         features that are not present on 'sm_30'.
>
>         Allowed values for this option:
> 'compute_30','compute_32','compute_35',
>
>
> 'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
>
>
> 'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',
>
>
> 'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72',
>
>         'sm_75'.
>
>
> --generate-code <specification>,...        (-gencode)
>
>
>         This option provides a generalization of the
> '--gpu-architecture=<arch> --gpu-code=<code>,
>
>         ...' option combination for specifying nvcc behavior with respect
> to code
>
>         generation.  Where use of the previous options generates code for
> different
>
>         'real' architectures with the PTX for the same 'virtual'
> architecture, option
>
>         '--generate-code' allows multiple PTX generations for different
> 'virtual'
>
>         architectures.  In fact, '--gpu-architecture=<arch>
> --gpu-code=<code>,
>
>         ...' is equivalent to '--generate-code
> arch=<arch>,code=<code>,...'.
>
>         '--generate-code' options may be repeated for different virtual
> architectures.
>
>         Allowed keywords for this option:  'arch','code'.
>
> On Mon, Apr 5, 2021 at 1:19 PM Satish Balay via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
>
>> This is nvidia mess-up. Why isn't there a command that give me these
>> values [if they insist on this interface for nvcc]
>>
>> I see Barry want configure to do something here - but whatever we do - we
>> would be shifting the problem around.
>> [even if we detect stuff - build box might not have the GPU used for
>> runs.]
>>
>> We have --with-cuda-arch - which I tried to remove from configure - but
>> its come back in a different form (--with-cuda-gencodearch)
>>
>> And I see other packages:
>>
>>   --with-kokkos-cuda-arch
>>
>> Wrt spack - I'm having to do:
>>
>> spack install xsdk+cuda ^magma cuda_arch=60
>>
>> [magma uses CudaPackage() infrastructure in spack]
>>
>> Satish
>>
>> On Mon, 5 Apr 2021, Mills, Richard Tran via petsc-dev wrote:
>>
>> > You raise a good point, Barry. I've been completely mystified by what
>> some of these names even mean. What does "PASCAL60" vs. "PASCAL61" even
>> mean? Do you know of where this is even documented? I can't really find
>> anything about it in the Kokkos documentation. The only thing I can really
>> find is an issue or two about "hey, shouldn't our CMake stuff figure this
>> out automatically" and then some posts about why it can't really do that.
>> Not encouraging.
>> >
>> > --Richard
>> >
>> > On 4/3/21 8:42 PM, Barry Smith wrote:
>> >
>> >
>> >   It would be very nice to NOT require PETSc users to provide this
>> flag, how the heck will they know what it should be when we cannot automate
>> it ourselves?
>> >
>> >   Any ideas of how this can be determined based on the current system?
>> NVIDIA does not help since these "advertising" names don't seem to
>> trivially map to information you can get from a particular GPU when you
>> logged into it. For example nvidia-smi doesn't use these names directly. Is
>> there some mapping from nvidia-smi  to these names we could use? If we are
>> serious about having a non-trivial number of users utilizing GPUs, which we
>> need to be for future, we cannot have this absurd demands in our
>> installation process.
>> >
>> >   Barry
>> >
>> > Does spack have some magic for this we could use?
>> >
>> >
>> >
>> >
>>
>>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210405/7755fe6e/attachment-0001.html>