[petsc-users] CUDA MatSetValues test

Stefano Zampini stefano.zampini at gmail.com
Fri May 28 12:50:16 CDT 2021


OpenMPI.py depends on cuda.py in that, if cuda is present, configures using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it adds a print if cuda is present)
Since eventually the MPI distro will only need a hint to be configured with CUDA, why not removing the dependency at all and add only a flag —download-openmpi-use-cuda?

> On May 28, 2021, at 8:44 PM, Barry Smith <bsmith at petsc.dev> wrote:
> 
> 
>  Stefano, who has a far better memory than me, wrote
> 
> > Or probably remove —download-openmpi ? Or, just for the moment, why can’t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later?
> 
>   MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py using the generic dependencies of configure/packages  
> 
>   but perhaps we can just hardwire the rerunning of cuda.py when the MPI compilers are reset. I will try that now and if I can get it to work we should be able to move those old fix branches along as MR.
> 
>   Barry
> 
> 
> 
>> On May 28, 2021, at 12:41 PM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>> 
>> OK, I will try to rebase and test Barry's branch.
>> 
>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>> wrote:
>> Yes, it is the branch I was using before force pushing to Barry’s barry/2020-11-11/cleanup-matsetvaluesdevice
>> You can use both I guess
>> 
>>> On May 28, 2021, at 8:25 PM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>>> 
>>> Is this the correct branch? It conflicted with ex5cu so I assume it is.
>>> 
>>> 
>>> stefanozampini/simplify-setvalues-device <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device>
>>> 
>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>>> I am fixing rebasing this branch over main.
>>> 
>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>> wrote:
>>> Or probably remove —download-openmpi ? Or, just for the moment, why can’t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later?
>>> 
>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>> wrote:
>>>> 
>>>> That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the —download-openmpi. We can probably try to skip the test in that specific configuration?
>>>> 
>>>>> On May 28, 2021, at 7:45 PM, Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>>> 
>>>>> 
>>>>> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check
>>>>> $ ./ex5cu
>>>>> terminate called after throwing an instance of 'thrust::system::system_error'
>>>>>   what():  fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
>>>>> Aborted (core dumped)
>>>>> 
>>>>>         requires: cuda !define(PETSC_USE_CTABLE)
>>>>> 
>>>>>   CI does not test with CUDA and no ctable.  The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up.
>>>>> 
>>>>>   Barry
>>>>> 
>>>>> 
>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>> wrote:
>>>>>> If you are referring to your device set values, I guess it is not currently tested
>>>>>> 
>>>>>> No. There is a test for that (ex5cu).
>>>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases.
>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist.
>>>>>>  
>>>>>> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 <https://gitlab.com/petsc/petsc/-/merge_requests/3411>
>>>>>> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ <https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/>
>>>>>> 
>>>>>> 
>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>>>>>>> 
>>>>>>> Is there a test with MatSetValues and CUDA? 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210528/5c361de2/attachment-0001.html>


More information about the petsc-users mailing list