[petsc-users] CUDA MatSetValues test
Mark Adams
mfadams at lbl.gov
Fri May 28 14:39:42 CDT 2021
Thanks,
I did not intend to make any (real) changes.
The only thing that I did not intend to use from Barry's branch, that
conflicted, was the help and comment block at the top of ex5cu.cu
* I ended up with two declarations of PetscSplitCSRDataStructure
* I added some includes to fix errors like this:
/ccs/home/adams/petsc/include/../src/mat/impls/aij/seq/seqcusparse/cusparsematimpl.h(263):
error: incomplete type is not allowed
* I end ended not having csr2csc_i in Mat_SeqAIJCUSPARSE so I get:
/autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/aij/seq/seqcusparse/
aijcusparse.cu(1348): error: class "Mat_SeqAIJCUSPARSE" has no member
"csr2csc_i"
On Fri, May 28, 2021 at 3:13 PM Stefano Zampini <stefano.zampini at gmail.com>
wrote:
> I can take a quick look at it tomorrow, what are the main changes you made
> since then?
>
> On May 28, 2021, at 9:51 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> I am getting messed up in trying to resolve conflicts in rebasing over
> main.
> Is there a better way of doing this?
> Can I just tell git to use Barry's version and then test it?
> Or should I just try it again?
>
> On Fri, May 28, 2021 at 2:15 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I am rebasing over main and its a bit of a mess. I must have missed
>> something. I get this. I think the _n_SplitCSRMat must be wrong.
>>
>>
>> In file included from
>> /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0:
>> /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting
>> types for 'PetscSplitCSRDataStructure'
>> typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure;
>> ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous
>> declaration of 'PetscSplitCSRDataStructure' was here
>> typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure;
>> ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o
>>
>> On Fri, May 28, 2021 at 1:50 PM Stefano Zampini <
>> stefano.zampini at gmail.com> wrote:
>>
>>> OpenMPI.py depends on cuda.py in that, if cuda is present, configures
>>> using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only
>>> weakly, it adds a print if cuda is present)
>>> Since eventually the MPI distro will only need a hint to be configured
>>> with CUDA, why not removing the dependency at all and add only a flag
>>> —download-openmpi-use-cuda?
>>>
>>> On May 28, 2021, at 8:44 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>
>>> Stefano, who has a far better memory than me, wrote
>>>
>>> > Or probably remove —download-openmpi ? Or, just for the moment, why
>>> can’t we just tell configure that mpi is a weak dependence of cuda.py, so
>>> that it will be forced to be configured later?
>>>
>>> MPI.py depends on cuda.py so we cannot also have cuda.py depend on
>>> MPI.py using the generic dependencies of configure/packages
>>>
>>> but perhaps we can just hardwire the rerunning of cuda.py when the MPI
>>> compilers are reset. I will try that now and if I can get it to work we
>>> should be able to move those old fix branches along as MR.
>>>
>>> Barry
>>>
>>>
>>>
>>> On May 28, 2021, at 12:41 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> OK, I will try to rebase and test Barry's branch.
>>>
>>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini <
>>> stefano.zampini at gmail.com> wrote:
>>>
>>>> Yes, it is the branch I was using before force pushing to
>>>> Barry’s barry/2020-11-11/cleanup-matsetvaluesdevice
>>>> You can use both I guess
>>>>
>>>> On May 28, 2021, at 8:25 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>> Is this the correct branch? It conflicted with ex5cu so I assume it is.
>>>>
>>>>
>>>> stefanozampini/simplify-setvalues-device
>>>> <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device>
>>>>
>>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>> I am fixing rebasing this branch over main.
>>>>>
>>>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini <
>>>>> stefano.zampini at gmail.com> wrote:
>>>>>
>>>>>> Or probably remove —download-openmpi ? Or, just for the moment, why
>>>>>> can’t we just tell configure that mpi is a weak dependence of cuda.py, so
>>>>>> that it will be forced to be configured later?
>>>>>>
>>>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini <
>>>>>> stefano.zampini at gmail.com> wrote:
>>>>>>
>>>>>> That branch provides a fix for MatSetValuesDevice but it never got
>>>>>> merged because of the CI issues with the —download-openmpi. We can probably
>>>>>> try to skip the test in that specific configuration?
>>>>>>
>>>>>> On May 28, 2021, at 7:45 PM, Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>
>>>>>>
>>>>>> ~/petsc/src/mat/tutorials*
>>>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)*
>>>>>> arch-robustify-cuda-gencodearch-check
>>>>>> $ ./ex5cu
>>>>>> terminate called after throwing an instance of
>>>>>> 'thrust::system::system_error'
>>>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an
>>>>>> illegal memory access was encountered
>>>>>> Aborted (core dumped)
>>>>>>
>>>>>> requires: cuda !define(PETSC_USE_CTABLE)
>>>>>>
>>>>>> CI does not test with CUDA and no ctable. The code is still broken
>>>>>> as it was six months ago in the discussion Stefano pointed to. It is clear
>>>>>> why just no one has had the time to clean things up.
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>>
>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini <
>>>>>> stefano.zampini at gmail.com> wrote:
>>>>>>
>>>>>>> If you are referring to your device set values, I guess it is not
>>>>>>> currently tested
>>>>>>>
>>>>>>
>>>>>> No. There is a test for that (ex5cu).
>>>>>> I have a user that is getting a segv in MatSetValues with
>>>>>> aijcusparse. I suspect there is memory corruption but I'm trying to cover
>>>>>> all the bases.
>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for
>>>>>> it if such a test does not exist.
>>>>>>
>>>>>>
>>>>>>> See the discussions here
>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411
>>>>>>> I started cleaning up the code to prepare for testing but we never
>>>>>>> finished it
>>>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
>>>>>>>
>>>>>>>
>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>> Is there a test with MatSetValues and CUDA?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210528/45b75bd1/attachment-0001.html>
More information about the petsc-users
mailing list