[petsc-users] CUDA MatSetValues test

Mark Adams mfadams at lbl.gov
Fri May 28 12:25:03 CDT 2021


Is this the correct branch? It conflicted with ex5cu so I assume it is.


stefanozampini/simplify-setvalues-device
<https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device>

On Fri, May 28, 2021 at 1:24 PM Mark Adams <mfadams at lbl.gov> wrote:

> I am fixing rebasing this branch over main.
>
> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini <stefano.zampini at gmail.com>
> wrote:
>
>> Or probably remove —download-openmpi ? Or, just for the moment, why can’t
>> we just tell configure that mpi is a weak dependence of cuda.py, so that it
>> will be forced to be configured later?
>>
>> On May 28, 2021, at 8:12 PM, Stefano Zampini <stefano.zampini at gmail.com>
>> wrote:
>>
>> That branch provides a fix for MatSetValuesDevice but it never got merged
>> because of the CI issues with the —download-openmpi. We can probably try to
>> skip the test in that specific configuration?
>>
>> On May 28, 2021, at 7:45 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>
>>
>> ~/petsc/src/mat/tutorials*
>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)*
>> arch-robustify-cuda-gencodearch-check
>> $ ./ex5cu
>> terminate called after throwing an instance of
>> 'thrust::system::system_error'
>>   what():  fill_n: failed to synchronize: cudaErrorIllegalAddress: an
>> illegal memory access was encountered
>> Aborted (core dumped)
>>
>>         requires: cuda !define(PETSC_USE_CTABLE)
>>
>>   CI does not test with CUDA and no ctable.  The code is still broken as
>> it was six months ago in the discussion Stefano pointed to. It is clear why
>> just no one has had the time to clean things up.
>>
>>   Barry
>>
>>
>> On May 28, 2021, at 11:13 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>>
>>
>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini <
>> stefano.zampini at gmail.com> wrote:
>>
>>> If you are referring to your device set values, I guess it is not
>>> currently tested
>>>
>>
>> No. There is a test for that (ex5cu).
>> I have a user that is getting a segv in MatSetValues with aijcusparse. I
>> suspect there is memory corruption but I'm trying to cover all the bases.
>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if
>> such a test does not exist.
>>
>>
>>> See the discussions here
>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411
>>> I started cleaning up the code to prepare for testing but we never
>>> finished it
>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
>>>
>>>
>>> On May 28, 2021, at 6:53 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> Is there a test with MatSetValues and CUDA?
>>>
>>>
>>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210528/d8cee769/attachment.html>


More information about the petsc-users mailing list