[petsc-users] Did CUDA break again?

Mark Adams mfadams at lbl.gov
Thu May 27 05:45:34 CDT 2021


FYI, I was running the test incorrectly:
03:38 cgpu12  ~/petsc_install$ srun -n 1 -G 1 ./a.out
70
70

On Wed, May 26, 2021 at 10:21 PM Mark Adams <mfadams at lbl.gov> wrote:

> I had git bisect working and was 4 steps away when I got a new crash.
> configure.log is empty.
>
> 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad
> Bisecting: 19 revisions left to test after this (roughly 4 steps)
> [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch
> 'tisaac/feature-spqr' into 'main'
> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$
> ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD
>
> ===============================================================================
>              Configuring PETSc to compile on your system
>
>
> ===============================================================================
>
> *******************************************************************************
>         CONFIGURATION CRASH  (Please send configure.log to
> petsc-maint at mcs.anl.gov)
>
> *******************************************************************************
>
> EOL while scanning string literal (cuda.py, line 176)
>   File "/global/u2/m/madams/petsc/config/configure.py", line 455, in
> petsc_configure
>     framework =
> config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:],
> loadArgDB = 0)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 107, in __init__
>     self.createChildren()
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 344, in createChildren
>     self.getChild(moduleName)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 329, in getChild
>     config.setupDependencies(self)
>   File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in
> setupDependencies
>     self.blasLapack    =
> framework.require('config.packages.BlasLapack',self)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 349, in require
>     config = self.getChild(moduleName, keywordArgs)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 329, in getChild
>     config.setupDependencies(self)
>   File
> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py",
> line 21, in setupDependencies
>     config.package.Package.setupDependencies(self, framework)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py",
> line 151, in setupDependencies
>     self.mpi         = framework.require('config.packages.MPI',self)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 349, in require
>     config = self.getChild(moduleName, keywordArgs)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 329, in getChild
>     config.setupDependencies(self)
>   File
> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line
> 73, in setupDependencies
>     self.mpich   = framework.require('config.packages.MPICH', self)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 349, in require
>     config = self.getChild(moduleName, keywordArgs)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 329, in getChild
>     config.setupDependencies(self)
>   File
> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py",
> line 16, in setupDependencies
>     self.cuda            = framework.require('config.packages.cuda',self)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 349, in require
>     config = self.getChild(moduleName, keywordArgs)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 302, in getChild
>     type   = __import__(moduleName, globals(), locals(),
> ['Configure']).Configure
> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$
> ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD
>
> On Wed, May 26, 2021 at 10:10 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>>
>>
>>
>> On Wed, May 26, 2021 at 6:13 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>   What is HOST=cori09  Does it have GPUs?
>>>
>>>
>>> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6
>>>
>>>   Seems to clearly state
>>>
>>> int  cudaDeviceProp
>>> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp>
>>> ::major
>>> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6>
>>>  [inherited]
>>>
>>> Major compute capability
>>>
>>>
>>> Mark, please compile and run this program on the machine you are running
>>> configure on
>>>
>>> #include <stdio.h>
>>> #include <cuda.h>
>>> #include <cuda_runtime.h>
>>> #include <cuda_runtime_api.h>
>>> #include <cuda_device_runtime_api.h>
>>> int main(int arg,char **args)
>>> {
>>> struct cudaDeviceProp dp;
>>>                 cudaGetDeviceProperties(&dp, 0);
>>>                 printf("%d\n",10*dp.major+dp.minor);
>>>
>>>                 int major,minor;
>>> cuDeviceGetAttribute(&major,
>>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0);
>>> cuDeviceGetAttribute(&minor,
>>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0);
>>>                 printf("%d\n",10*major+minor);
>>>                 return(0);
>>>
>> Probably, you need to check the return code of these two function calls
>> to make sure they are correct.
>>
>>
>>> }
>>>
>>> This is what I get
>>>
>>> $ nvcc mytest.c -lcuda
>>> ~/petsc* (main=)* arch-main
>>> $ ./a.out
>>> 70
>>> 70
>>>
>>> Which is exactly what it is suppose to do.
>>>
>>> Barry
>>>
>>> On May 26, 2021, at 5:31 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>
>>>   Yes, this code which I guess never got hit before
>>>
>>> cudaDeviceProp dp;                cudaGetDeviceProperties(&dp, 0);
>>>                printf("%d\n",10*dp.major+dp.minor);
>>>                return(0);;
>>>
>>> is using the wrong property for the generation.
>>>
>>>  Back to the CUDA documentation for the correct information.
>>>
>>>
>>>
>>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch <jacob.fai at gmail.com>
>>> wrote:
>>>
>>> 1120 sounds suspiciously like some CUDA version rather than architecture
>>> or compute capability…
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> Cell: +1 (312) 694-3391
>>>
>>> On May 26, 2021, at 22:29, Mark Adams <mfadams at lbl.gov> wrote:
>>> 
>>> I started to get this error today on Cori.
>>>
>>> nvcc fatal   : Unsupported gpu architecture 'compute_1120'
>>>
>>> I am pretty sure I had a clean build but I can redo it if you don't know
>>> where this is from.
>>>
>>> Thanks,
>>> Mark
>>> <configure.log>
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210527/8c789ea3/attachment.html>


More information about the petsc-users mailing list