[petsc-users] Did CUDA break again?
Mark Adams
mfadams at lbl.gov
Thu May 27 05:45:34 CDT 2021
FYI, I was running the test incorrectly:
03:38 cgpu12 ~/petsc_install$ srun -n 1 -G 1 ./a.out
70
70
On Wed, May 26, 2021 at 10:21 PM Mark Adams <mfadams at lbl.gov> wrote:
> I had git bisect working and was 4 steps away when I got a new crash.
> configure.log is empty.
>
> 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad
> Bisecting: 19 revisions left to test after this (roughly 4 steps)
> [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch
> 'tisaac/feature-spqr' into 'main'
> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$
> ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD
>
> ===============================================================================
> Configuring PETSc to compile on your system
>
>
> ===============================================================================
>
> *******************************************************************************
> CONFIGURATION CRASH (Please send configure.log to
> petsc-maint at mcs.anl.gov)
>
> *******************************************************************************
>
> EOL while scanning string literal (cuda.py, line 176)
> File "/global/u2/m/madams/petsc/config/configure.py", line 455, in
> petsc_configure
> framework =
> config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:],
> loadArgDB = 0)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 107, in __init__
> self.createChildren()
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 344, in createChildren
> self.getChild(moduleName)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 329, in getChild
> config.setupDependencies(self)
> File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in
> setupDependencies
> self.blasLapack =
> framework.require('config.packages.BlasLapack',self)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 349, in require
> config = self.getChild(moduleName, keywordArgs)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 329, in getChild
> config.setupDependencies(self)
> File
> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py",
> line 21, in setupDependencies
> config.package.Package.setupDependencies(self, framework)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py",
> line 151, in setupDependencies
> self.mpi = framework.require('config.packages.MPI',self)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 349, in require
> config = self.getChild(moduleName, keywordArgs)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 329, in getChild
> config.setupDependencies(self)
> File
> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line
> 73, in setupDependencies
> self.mpich = framework.require('config.packages.MPICH', self)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 349, in require
> config = self.getChild(moduleName, keywordArgs)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 329, in getChild
> config.setupDependencies(self)
> File
> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py",
> line 16, in setupDependencies
> self.cuda = framework.require('config.packages.cuda',self)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 349, in require
> config = self.getChild(moduleName, keywordArgs)
> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py",
> line 302, in getChild
> type = __import__(moduleName, globals(), locals(),
> ['Configure']).Configure
> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$
> ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD
>
> On Wed, May 26, 2021 at 10:10 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>>
>>
>>
>> On Wed, May 26, 2021 at 6:13 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>> What is HOST=cori09 Does it have GPUs?
>>>
>>>
>>> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6
>>>
>>> Seems to clearly state
>>>
>>> int cudaDeviceProp
>>> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp>
>>> ::major
>>> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6>
>>> [inherited]
>>>
>>> Major compute capability
>>>
>>>
>>> Mark, please compile and run this program on the machine you are running
>>> configure on
>>>
>>> #include <stdio.h>
>>> #include <cuda.h>
>>> #include <cuda_runtime.h>
>>> #include <cuda_runtime_api.h>
>>> #include <cuda_device_runtime_api.h>
>>> int main(int arg,char **args)
>>> {
>>> struct cudaDeviceProp dp;
>>> cudaGetDeviceProperties(&dp, 0);
>>> printf("%d\n",10*dp.major+dp.minor);
>>>
>>> int major,minor;
>>> cuDeviceGetAttribute(&major,
>>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0);
>>> cuDeviceGetAttribute(&minor,
>>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0);
>>> printf("%d\n",10*major+minor);
>>> return(0);
>>>
>> Probably, you need to check the return code of these two function calls
>> to make sure they are correct.
>>
>>
>>> }
>>>
>>> This is what I get
>>>
>>> $ nvcc mytest.c -lcuda
>>> ~/petsc* (main=)* arch-main
>>> $ ./a.out
>>> 70
>>> 70
>>>
>>> Which is exactly what it is suppose to do.
>>>
>>> Barry
>>>
>>> On May 26, 2021, at 5:31 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>
>>> Yes, this code which I guess never got hit before
>>>
>>> cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0);
>>> printf("%d\n",10*dp.major+dp.minor);
>>> return(0);;
>>>
>>> is using the wrong property for the generation.
>>>
>>> Back to the CUDA documentation for the correct information.
>>>
>>>
>>>
>>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch <jacob.fai at gmail.com>
>>> wrote:
>>>
>>> 1120 sounds suspiciously like some CUDA version rather than architecture
>>> or compute capability…
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> Cell: +1 (312) 694-3391
>>>
>>> On May 26, 2021, at 22:29, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> I started to get this error today on Cori.
>>>
>>> nvcc fatal : Unsupported gpu architecture 'compute_1120'
>>>
>>> I am pretty sure I had a clean build but I can redo it if you don't know
>>> where this is from.
>>>
>>> Thanks,
>>> Mark
>>> <configure.log>
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210527/8c789ea3/attachment.html>
More information about the petsc-users
mailing list