[petsc-users] CUDA MatSetValues test

Mark Adams mfadams at lbl.gov
Fri May 28 13:15:38 CDT 2021


I am rebasing over main and its a bit of a mess. I must have missed
something. I get this. I think the _n_SplitCSRMat must be wrong.


In file included from
/autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0:
/ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types
for 'PetscSplitCSRDataStructure'
 typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure;
                                ^~~~~~~~~~~~~~~~~~~~~~~~~~
/ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous
declaration of 'PetscSplitCSRDataStructure' was here
 typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure;
                               ^~~~~~~~~~~~~~~~~~~~~~~~~~
          CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o

On Fri, May 28, 2021 at 1:50 PM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> OpenMPI.py depends on cuda.py in that, if cuda is present, configures
> using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only
> weakly, it adds a print if cuda is present)
> Since eventually the MPI distro will only need a hint to be configured
> with CUDA, why not removing the dependency at all and add only a flag
> —download-openmpi-use-cuda?
>
> On May 28, 2021, at 8:44 PM, Barry Smith <bsmith at petsc.dev> wrote:
>
>
>  Stefano, who has a far better memory than me, wrote
>
> > Or probably remove —download-openmpi ? Or, just for the moment, why
> can’t we just tell configure that mpi is a weak dependence of cuda.py, so
> that it will be forced to be configured later?
>
>   MPI.py depends on cuda.py so we cannot also have cuda.py depend on
> MPI.py using the generic dependencies of configure/packages
>
>   but perhaps we can just hardwire the rerunning of cuda.py when the MPI
> compilers are reset. I will try that now and if I can get it to work we
> should be able to move those old fix branches along as MR.
>
>   Barry
>
>
>
> On May 28, 2021, at 12:41 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> OK, I will try to rebase and test Barry's branch.
>
> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini <stefano.zampini at gmail.com>
> wrote:
>
>> Yes, it is the branch I was using before force pushing to
>> Barry’s barry/2020-11-11/cleanup-matsetvaluesdevice
>> You can use both I guess
>>
>> On May 28, 2021, at 8:25 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>> Is this the correct branch? It conflicted with ex5cu so I assume it is.
>>
>>
>> stefanozampini/simplify-setvalues-device
>> <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device>
>>
>> On Fri, May 28, 2021 at 1:24 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> I am fixing rebasing this branch over main.
>>>
>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini <
>>> stefano.zampini at gmail.com> wrote:
>>>
>>>> Or probably remove —download-openmpi ? Or, just for the moment, why
>>>> can’t we just tell configure that mpi is a weak dependence of cuda.py, so
>>>> that it will be forced to be configured later?
>>>>
>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini <stefano.zampini at gmail.com>
>>>> wrote:
>>>>
>>>> That branch provides a fix for MatSetValuesDevice but it never got
>>>> merged because of the CI issues with the —download-openmpi. We can probably
>>>> try to skip the test in that specific configuration?
>>>>
>>>> On May 28, 2021, at 7:45 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>>
>>>>
>>>> ~/petsc/src/mat/tutorials*
>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)*
>>>> arch-robustify-cuda-gencodearch-check
>>>> $ ./ex5cu
>>>> terminate called after throwing an instance of
>>>> 'thrust::system::system_error'
>>>>   what():  fill_n: failed to synchronize: cudaErrorIllegalAddress: an
>>>> illegal memory access was encountered
>>>> Aborted (core dumped)
>>>>
>>>>         requires: cuda !define(PETSC_USE_CTABLE)
>>>>
>>>>   CI does not test with CUDA and no ctable.  The code is still broken
>>>> as it was six months ago in the discussion Stefano pointed to. It is clear
>>>> why just no one has had the time to clean things up.
>>>>
>>>>   Barry
>>>>
>>>>
>>>> On May 28, 2021, at 11:13 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>
>>>>
>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini <
>>>> stefano.zampini at gmail.com> wrote:
>>>>
>>>>> If you are referring to your device set values, I guess it is not
>>>>> currently tested
>>>>>
>>>>
>>>> No. There is a test for that (ex5cu).
>>>> I have a user that is getting a segv in MatSetValues with aijcusparse.
>>>> I suspect there is memory corruption but I'm trying to cover all the bases.
>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it
>>>> if such a test does not exist.
>>>>
>>>>
>>>>> See the discussions here
>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411
>>>>> I started cleaning up the code to prepare for testing but we never
>>>>> finished it
>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
>>>>>
>>>>>
>>>>> On May 28, 2021, at 6:53 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>> Is there a test with MatSetValues and CUDA?
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210528/8fc45c6a/attachment.html>


More information about the petsc-users mailing list