[petsc-users] CUDA MatSetValues test

Mark Adams mfadams at lbl.gov
Fri May 28 12:41:02 CDT 2021


OK, I will try to rebase and test Barry's branch.

On Fri, May 28, 2021 at 1:26 PM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> Yes, it is the branch I was using before force pushing to
> Barry’s barry/2020-11-11/cleanup-matsetvaluesdevice
> You can use both I guess
>
> On May 28, 2021, at 8:25 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> Is this the correct branch? It conflicted with ex5cu so I assume it is.
>
>
> stefanozampini/simplify-setvalues-device
> <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device>
>
> On Fri, May 28, 2021 at 1:24 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I am fixing rebasing this branch over main.
>>
>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini <
>> stefano.zampini at gmail.com> wrote:
>>
>>> Or probably remove —download-openmpi ? Or, just for the moment, why
>>> can’t we just tell configure that mpi is a weak dependence of cuda.py, so
>>> that it will be forced to be configured later?
>>>
>>> On May 28, 2021, at 8:12 PM, Stefano Zampini <stefano.zampini at gmail.com>
>>> wrote:
>>>
>>> That branch provides a fix for MatSetValuesDevice but it never got
>>> merged because of the CI issues with the —download-openmpi. We can probably
>>> try to skip the test in that specific configuration?
>>>
>>> On May 28, 2021, at 7:45 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>
>>> ~/petsc/src/mat/tutorials*
>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)*
>>> arch-robustify-cuda-gencodearch-check
>>> $ ./ex5cu
>>> terminate called after throwing an instance of
>>> 'thrust::system::system_error'
>>>   what():  fill_n: failed to synchronize: cudaErrorIllegalAddress: an
>>> illegal memory access was encountered
>>> Aborted (core dumped)
>>>
>>>         requires: cuda !define(PETSC_USE_CTABLE)
>>>
>>>   CI does not test with CUDA and no ctable.  The code is still broken as
>>> it was six months ago in the discussion Stefano pointed to. It is clear why
>>> just no one has had the time to clean things up.
>>>
>>>   Barry
>>>
>>>
>>> On May 28, 2021, at 11:13 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>
>>>
>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini <
>>> stefano.zampini at gmail.com> wrote:
>>>
>>>> If you are referring to your device set values, I guess it is not
>>>> currently tested
>>>>
>>>
>>> No. There is a test for that (ex5cu).
>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I
>>> suspect there is memory corruption but I'm trying to cover all the bases.
>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it
>>> if such a test does not exist.
>>>
>>>
>>>> See the discussions here
>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411
>>>> I started cleaning up the code to prepare for testing but we never
>>>> finished it
>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
>>>>
>>>>
>>>> On May 28, 2021, at 6:53 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>> Is there a test with MatSetValues and CUDA?
>>>>
>>>>
>>>>
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210528/ad61f540/attachment.html>


More information about the petsc-users mailing list