[petsc-dev] [petsc-maint] Installing with CUDA on a cluster

Mon Mar 12 05:50:50 CDT 2018

Hi Satish,

thanks for the pull request. I approve the changes, improved appending 
the -Wno-deprecated-gpu-targets to also work on my machine, and have 
merged everything to next.

> * My fixes should alleviate some of the CUSP installation issues. I
>    don't know enough about CUSP interface wrt the useful features vs
>    other burderns - and if its good to drop it or not. [If needed - we
>    can add in more version dependencies in configure]

This should be fine for now. In the long term CUSP may be completely 
superseded by NVIDIA's AMGX. Let's see how things develop...

> * Wrt CUDA - currently my test is with CUDA-7.5. I can try migrating a
>    couple of tests to CUDA-9.1 [on frog]. But what about older
>    releases?  Any reason we should drop them? I.e any reason to up the
>    following values?
> 
>      self.CUDAMinVersion   = '5000' # Minimal cuda version is 5.0
>      self.CUSPMinVersion  = '400' # Minimal cusp version is 0.4

See the answer here for a list of CUDA capabilities and defaults:
https://stackoverflow.com/questions/28932864/cuda-compute-capability-requirements

We definitely don't need to support compute architecture 1.x (~10 years 
old), as there is no double precision support and hence fairly useless 
for our purposes. Thus, we should be absolutely fine with requiring CUDA 
7.0 or higher.

> 
>    We do change it for complex build [we don't have a test for this case]
> 
>      if self.defaultScalarType.lower() == 'complex': self.CUDAMinVersion = '7050'

I don't remember the exact reason, but I remember that there is one for 
requiring CUDA 7.5 here. Let's use CUDA 7.5 as the minimum for both real 
and complex then?

> * Our test GUP is M2090 - with Compute capability (version) 2.0.
>    CUDA-7.5 works on it. CUDA-8 gives deprecated warnings. CUDA-9 does
>    not work? So what do we do for such old hardware? Do we keep
>    CUDA-7.5 is the minimum supported version for extended time? [At
>    some point we could switch to minimum version CUDA-8 - if we can get
>    rid of the warnings]

Your PR silences the deprecation warnings.
Compute capability 2.0 is fine for our tests for some time to come. We 
should certainly upgrade at some point, yet my experience with GPUs is 
that older GPUs are actually the better test environment, as they tend 
to reveal bugs quicker than newer hardware.

Best regards,
Karli

> 
> * BTW: Wrt --with-cuda-arch, I'm hoping we can get rid of it in favor
>    of CUDAFLAGS [with defaults similar to CFLAGS defaults] - but its
>    not clear if I can easily untangle the dependencies we have [wrt CPP
>    - and others]
>    
>    Or can we get rid of this default alltogether [currently
>    -arch=sm_20] - and expect nvcc to have sane defaults? Then we can
>    probably eliminate all this complicated code. [If cuda-7.5 and higer
>    do this properly - we could use that as the minimum supported version?]
> 
> Satish
> 
> On Sat, 10 Mar 2018, Karl Rupp wrote:
> 
>> Hi all,
>>
>> a couple of notes here, particularly for Manuel:
>>
>>   * CUSP is repeatedly causing such installation problems, hence we will soon
>> drop it as a vector backend and instead only provide a native
>> CUBLAS/CUSPARSE-based backend.
>>
>>   * you can use this native CUDA backend already now. Just configure with only
>> --with-cuda=1 --with-cuda-arch=sm_60 (sm_30 should also work and is compatible
>> with Tesla K20 GPUs you may find on other clusters).
>>
>>   * The multigrid preconditioner from CUSP is selected via
>>     -pc_type sa_cusp
>>     Make sure you also use -vec_type cusp -mat_type aijcusp
>>     If you don't need the multigrid preconditioner from CUSP, please
>> just reconfigure and use the native CUDA-backend with -vec_type cuda -mat_type
>> aijcusparse
>>
>>   * Right now only one of {native CUDA, CUSP, ViennaCL} can be activated at
>> configure time. This will be fixed later this month.
>>
>> If you're looking for a GPU-accelerated multigrid preconditioner: I just heard
>> yesterday that NVIDIA's AMGX is now open source. I'll provide a wrapper within
>> PETSc soon.
>>
>> As Matt already said: Don't expect much more than a modest speedup over your
>> existing CPU-based code - provided that your setup is GPU-friendly and your
>> problem size is appropriate.
>>
>> Best regards,
>> Karli
>>
>>
>>
>>
>> On 03/10/2018 03:38 AM, Satish Balay wrote:
>>> I've updated configure so that --download-cusp gets the
>>> correct/compatible cusp version - for cuda 7,8 vs 9
>>>
>>> The changes are in branch balay/cuda-cusp-cleanup - and merged to next.
>>>
>>> Satish
>>>
>>> On Wed, 7 Mar 2018, Satish Balay wrote:
>>>
>>>> --download-cusp gets hardly ever used so likely broken.
>>>>
>>>> It needs to be updated to somehow use the correct cusp version based
>>>> on the cuda version thats being used.
>>>>
>>>> [and since we can't easily check for cusp compatibility - we should
>>>> probably
>>>> remove checkCUSPVersion() code]
>>>>
>>>> When using Cuda-9 - you can try options:
>>>>
>>>> --download-cusp=1 --download-cusp-commit=116b090
>>>>
>>
>>
>