[petsc-users] CUDA error

Thu Apr 16 06:59:24 CDT 2020

On Wed, Apr 15, 2020 at 3:18 PM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

>
>
> On Apr 15, 2020, at 10:14 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> Thanks, it looks correct. I am getting memory leaks (appended)
>
> And something horrible is going on with performance:
>
> MatLUFactorNum       130 1.0 9.2220e+00 1.0 6.51e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 30  0  0  0  0  30  0  0  0  0    71       0    390 3.33e+02    0
> 0.00e+00  0
>
>
> MatLUFactorNum       130 1.0 6.5177e-01 1.0 1.28e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  4  1  0  0  0   4  1  0  0  0  1966       0      0 0.00e+00    0
> 0.00e+00  0
>
>
> Can you describe these numbers? It seems that in the second case the
> factorization is run on the CPU (as I explained in my previous message)
>
>
This is with and without cusparse. So yes the second case is the normal CPU
solver.

Oh, the factorization is not done on the GPU. So only the solves are on the
GPU. Alright that is not useful for me.

Thanks,

> This is not urgent, but I'd like to get a serial LU GPU solver at
> some point.
>
>
> Thanks again,
> Mark
>
> Lots of these:
> [ 0]32 bytes VecCUDAAllocateCheck() line 34 in
> /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/
> veccuda2.cu
> [ 0]32 bytes VecCUDAAllocateCheck() line 34 in
> /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/
> veccuda2.cu
> [ 0]32 bytes VecCUDAAllocateCheck() line 34 in
> /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/
> veccuda2.cu
>
>
> Yes, as I said, the code is in bad shape. I’ll see what I can do.
>
> On Wed, Apr 15, 2020 at 12:47 PM Stefano Zampini <
> stefano.zampini at gmail.com> wrote:
>
>> Mark
>>
>> attached is the patch. I will open an MR in the next days if you confirm
>> it is working for you
>> The issue is that CUSPARSE does not have a way to compute the triangular
>> factors, so we demand the computation of the factors to PETSc (CPU). These
>> factors are then copied to the GPU.
>> What was happening in the second step of SNES, was that the factors were
>> never updated since the offloadmask was never updated.
>>
>> The real issue is that the CUSPARSE support in PETSc is really in bad
>> shape and mostly untested, with coding solutions that are probably outdated
>> now.
>> I'll see what I can do to fix the class if I have time in the next weeks.
>>
>> Stefano
>>
>> Il giorno mer 15 apr 2020 alle ore 17:21 Mark Adams <mfadams at lbl.gov> ha
>> scritto:
>>
>>>
>>>
>>> On Wed, Apr 15, 2020 at 8:24 AM Stefano Zampini <
>>> stefano.zampini at gmail.com> wrote:
>>>
>>>> Mark
>>>>
>>>> I have fixed few things in the solver and it is tested with the current
>>>> master.
>>>>
>>>
>>> I rebased with master over the weekend ....
>>>
>>>
>>>> Can you write a MWE to reproduce the issue? Which version of CUDA and
>>>> CUSPARSE are you using?
>>>>
>>>
>>> You can use mark/feature-xgc-interface-rebase branch and add '-mat_type
>>> seqaijcusparse -fp_pc_factor_mat_solver_type cusparse
>>> -mat_cusparse_storage_format ell -vec_type cuda'
>>> to dm/impls/plex/tutorials/ex10.c
>>>
>>> The first stage, SNES solve, actually looks OK here. Maybe.
>>>
>>> Thanks,
>>>
>>> 10:01 mark/feature-xgc-interface-rebase *= ~/petsc$ make -f gmakefile
>>> test search='dm_impls_plex_tutorials-ex10_0'
>>> /usr/bin/python /ccs/home/adams/petsc/config/gmakegentest.py
>>> --petsc-dir=/ccs/home/adams/petsc --petsc-arch=arch-summit-opt64-gnu-cuda
>>> --testdir=./arch-summit-opt64-gnu-cuda/tests
>>> Using MAKEFLAGS: search=dm_impls_plex_tutorials-ex10_0
>>>           CC
>>> arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10.o
>>>      CLINKER
>>> arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10
>>>         TEST
>>> arch-summit-opt64-gnu-cuda/tests/counts/dm_impls_plex_tutorials-ex10_0.counts
>>>  ok dm_impls_plex_tutorials-ex10_0
>>> not ok diff-dm_impls_plex_tutorials-ex10_0 # Error code: 1
>>> #       14,16c14,16
>>> #       <     0 SNES Function norm 6.184233768573e-04
>>> #       <     1 SNES Function norm 1.467479466750e-08
>>> #       <     2 SNES Function norm 7.863111141350e-12
>>> #       ---
>>> #       >     0 SNES Function norm 6.184233768572e-04
>>> #       >     1 SNES Function norm 1.467479466739e-08
>>> #       >     2 SNES Function norm 7.863102870090e-12
>>> #       18,31c18,256
>>> #       <     0 SNES Function norm 6.182952107532e-04
>>> #       <     1 SNES Function norm 7.336382211149e-09
>>> #       <     2 SNES Function norm 1.566979901443e-11
>>> #       <   Nonlinear fp_ solve converged due to
>>> CONVERGED_FNORM_RELATIVE iterations 2
>>> #       <     0 SNES Function norm 6.183592738545e-04
>>> #       <     1 SNES Function norm 7.337681407420e-09
>>> #       <     2 SNES Function norm 1.408823933908e-11
>>> #       <   Nonlinear fp_ solve converged due to
>>> CONVERGED_FNORM_RELATIVE iterations 2
>>> #       < [0] TSAdaptChoose_Basic(): Estimated scaled local truncation
>>> error 0.0396569, accepting step of size 1e-06
>>> #       < 1 TS dt 1.25e-06 time 1e-06
>>> #       <   1) species-0: charge density= -1.6024814608984e+01
>>> z-momentum=  2.0080682964364e-19 energy=  1.2018000284846e+05
>>> #       <   1) species-1: charge density=  1.6021676653316e+01
>>> z-momentum=  1.4964483981137e-17 energy=  1.2017223215083e+05
>>> #       <   1) species-2: charge density=  2.8838441139703e-03
>>> z-momentum= -1.1062018110807e-23 energy=  1.2019641370376e-03
>>> #       <         1) Total: charge density= -2.5411155383649e-04,
>>> momentum=  1.5165279748763e-17, energy=  2.4035223620125e+05 (m_i[0]/m_e =
>>> 3670.94, 140 cells), 1 sub threads
>>> #       ---
>>> #       >     0 SNES Function norm 6.182952107531e-04
>>> #       >     1 SNES Function norm 6.181600164904e-04
>>> #       >     2 SNES Function norm 6.180249471739e-04
>>> #       >     3 SNES Function norm 6.178899987549e-04
>>>
>>>
>>>> I was planning to reorganize the factor code in AIJCUSPARSE in the next
>>>> days.
>>>>
>>>> kl-18967:petsc zampins$ git grep "solver_type cusparse"
>>>> src/ksp/ksp/examples/tests/ex43.c:      args: -f
>>>> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_*solver_type
>>>> cusparse* -mat_cusparse_storage_format ell -vec_type cuda -pc_type ilu
>>>> src/ksp/ksp/examples/tests/ex43.c:      args: -f
>>>> ${DATAFILESPATH}/matrices/shallow_water1 -mat_type seqaijcusparse
>>>> -pc_factor_mat_*solver_type cusparse* -mat_cusparse_storage_format hyb
>>>> -vec_type cuda -ksp_type cg -pc_type icc
>>>> src/ksp/ksp/examples/tests/ex43.c:      args: -f
>>>> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_*solver_type
>>>> cusparse* -mat_cusparse_storage_format csr -vec_type cuda -ksp_type
>>>> bicg -pc_type ilu
>>>> src/ksp/ksp/examples/tests/ex43.c:      args: -f
>>>> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_*solver_type
>>>> cusparse* -mat_cusparse_storage_format csr -vec_type cuda -ksp_type
>>>> bicg -pc_type ilu -pc_factor_mat_ordering_type nd
>>>> src/ksp/ksp/examples/tutorials/ex46.c:      args: -dm_mat_type
>>>> aijcusparse -dm_vec_type cuda -random_exact_sol -pc_type ilu -pc_factor_mat_*solver_type
>>>> cusparse*
>>>> src/ksp/ksp/examples/tutorials/ex59.c:     args: -subdomain_mat_type
>>>> aijcusparse -physical_pc_bddc_dirichlet_pc_factor_mat_*solver_type
>>>> cusparse*
>>>> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short
>>>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse*
>>>> -vec_type cuda -sub_ksp_type preonly -sub_pc_type ilu
>>>> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short
>>>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse*
>>>> -vec_type cuda -sub_ksp_type preonly -sub_pc_type ilu
>>>> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short
>>>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse*
>>>> -vec_type cuda
>>>> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short
>>>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse*
>>>> -vec_type cuda
>>>> src/ksp/ksp/examples/tutorials/ex71.c:   args: -pde_type Poisson
>>>> -cells 7,9,8 -dim 3 -ksp_view -pc_bddc_coarse_redundant_pc_type svd
>>>> -ksp_error_if_not_converged -pc_bddc_dirichlet_pc_type cholesky
>>>> -pc_bddc_dirichlet_pc_factor_mat_*solver_type cusparse*
>>>> -pc_bddc_dirichlet_pc_factor_mat_ordering_type nd -pc_bddc_neumann_pc_type
>>>> cholesky -pc_bddc_neumann_pc_factor_mat_*solver_type cusparse*
>>>> -pc_bddc_neumann_pc_factor_mat_ordering_type nd -matis_localmat_type
>>>> aijcusparse
>>>> src/ksp/ksp/examples/tutorials/ex72.c:      args: -f0
>>>> ${DATAFILESPATH}/matrices/medium -ksp_monitor_short -ksp_view -mat_view
>>>> ascii::ascii_info -mat_type aijcusparse -pc_factor_mat_*solver_type
>>>> cusparse* -pc_type ilu -vec_type cuda
>>>> src/snes/examples/tutorials/ex12.c:      args: -matis_localmat_type
>>>> aijcusparse -pc_bddc_dirichlet_pc_factor_mat_*solver_type cusparse*
>>>> -pc_bddc_neumann_pc_factor_mat_*solver_type cusparse*
>>>>
>>>> On Apr 15, 2020, at 2:20 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>> I tried using a serial direct solver in cusparse and got bad numerics:
>>>>
>>>> -vector_type cuda -mat_type aijcusparse -pc_factor_mat_solver_type
>>>> cusparse
>>>>
>>>> Before I start debugging this I wanted to see if there are any known
>>>> issues that I should be aware of.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>
>> --
>> Stefano
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200416/41ec631f/attachment-0001.html>