[petsc-users] Did CUDA break again?

Mark Adams mfadams at lbl.gov
Wed May 26 21:21:08 CDT 2021


I had git bisect working and was 4 steps away when I got a new crash.
configure.log is empty.

19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad
Bisecting: 19 revisions left to test after this (roughly 4 steps)
[149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch
'tisaac/feature-spqr' into 'main'
19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py
PETSC_DIR=$PWD
===============================================================================
             Configuring PETSc to compile on your system

===============================================================================
*******************************************************************************
        CONFIGURATION CRASH  (Please send configure.log to
petsc-maint at mcs.anl.gov)
*******************************************************************************

EOL while scanning string literal (cuda.py, line 176)
  File "/global/u2/m/madams/petsc/config/configure.py", line 455, in
petsc_configure
    framework =
config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:],
loadArgDB = 0)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 107, in __init__
    self.createChildren()
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 344, in createChildren
    self.getChild(moduleName)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 329, in getChild
    config.setupDependencies(self)
  File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in
setupDependencies
    self.blasLapack    =
framework.require('config.packages.BlasLapack',self)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 349, in require
    config = self.getChild(moduleName, keywordArgs)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 329, in getChild
    config.setupDependencies(self)
  File
"/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py",
line 21, in setupDependencies
    config.package.Package.setupDependencies(self, framework)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py",
line 151, in setupDependencies
    self.mpi         = framework.require('config.packages.MPI',self)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 349, in require
    config = self.getChild(moduleName, keywordArgs)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 329, in getChild
    config.setupDependencies(self)
  File
"/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line
73, in setupDependencies
    self.mpich   = framework.require('config.packages.MPICH', self)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 349, in require
    config = self.getChild(moduleName, keywordArgs)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 329, in getChild
    config.setupDependencies(self)
  File
"/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py",
line 16, in setupDependencies
    self.cuda            = framework.require('config.packages.cuda',self)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 349, in require
    config = self.getChild(moduleName, keywordArgs)
  File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
line 302, in getChild
    type   = __import__(moduleName, globals(), locals(),
['Configure']).Configure
19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py
PETSC_DIR=$PWD

On Wed, May 26, 2021 at 10:10 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

>
>
>
> On Wed, May 26, 2021 at 6:13 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>   What is HOST=cori09  Does it have GPUs?
>>
>>
>> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6
>>
>>   Seems to clearly state
>>
>> int  cudaDeviceProp
>> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp>
>> ::major
>> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6>
>>  [inherited]
>>
>> Major compute capability
>>
>>
>> Mark, please compile and run this program on the machine you are running
>> configure on
>>
>> #include <stdio.h>
>> #include <cuda.h>
>> #include <cuda_runtime.h>
>> #include <cuda_runtime_api.h>
>> #include <cuda_device_runtime_api.h>
>> int main(int arg,char **args)
>> {
>> struct cudaDeviceProp dp;
>>                 cudaGetDeviceProperties(&dp, 0);
>>                 printf("%d\n",10*dp.major+dp.minor);
>>
>>                 int major,minor;
>> cuDeviceGetAttribute(&major,
>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0);
>> cuDeviceGetAttribute(&minor,
>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0);
>>                 printf("%d\n",10*major+minor);
>>                 return(0);
>>
> Probably, you need to check the return code of these two function calls to
> make sure they are correct.
>
>
>> }
>>
>> This is what I get
>>
>> $ nvcc mytest.c -lcuda
>> ~/petsc* (main=)* arch-main
>> $ ./a.out
>> 70
>> 70
>>
>> Which is exactly what it is suppose to do.
>>
>> Barry
>>
>> On May 26, 2021, at 5:31 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>
>>
>>   Yes, this code which I guess never got hit before
>>
>> cudaDeviceProp dp;                cudaGetDeviceProperties(&dp, 0);
>>                printf("%d\n",10*dp.major+dp.minor);
>>                return(0);;
>>
>> is using the wrong property for the generation.
>>
>>  Back to the CUDA documentation for the correct information.
>>
>>
>>
>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch <jacob.fai at gmail.com>
>> wrote:
>>
>> 1120 sounds suspiciously like some CUDA version rather than architecture
>> or compute capability…
>>
>> Best regards,
>>
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> Cell: +1 (312) 694-3391
>>
>> On May 26, 2021, at 22:29, Mark Adams <mfadams at lbl.gov> wrote:
>> 
>> I started to get this error today on Cori.
>>
>> nvcc fatal   : Unsupported gpu architecture 'compute_1120'
>>
>> I am pretty sure I had a clean build but I can redo it if you don't know
>> where this is from.
>>
>> Thanks,
>> Mark
>> <configure.log>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210526/33a82c6a/attachment.html>


More information about the petsc-users mailing list