[petsc-users] CUDA error

Wed Apr 15 11:42:57 CDT 2020

Mark

attached is the patch. I will open an MR in the next days if you confirm it
is working for you
The issue is that CUSPARSE does not have a way to compute the triangular
factors, so we demand the computation of the factors to PETSc (CPU). These
factors are then copied to the GPU.
What was happening in the second step of SNES, was that the factors were
never updated since the offloadmask was never updated.

The real issue is that the CUSPARSE support in PETSc is really in bad shape
and mostly untested, with coding solutions that are probably outdated now.
I'll see what I can do to fix the class if I have time in the next weeks.

Stefano

Il giorno mer 15 apr 2020 alle ore 17:21 Mark Adams <mfadams at lbl.gov> ha
scritto:

>
>
> On Wed, Apr 15, 2020 at 8:24 AM Stefano Zampini <stefano.zampini at gmail.com>
> wrote:
>
>> Mark
>>
>> I have fixed few things in the solver and it is tested with the current
>> master.
>>
>
> I rebased with master over the weekend ....
>
>
>> Can you write a MWE to reproduce the issue? Which version of CUDA and
>> CUSPARSE are you using?
>>
>
> You can use mark/feature-xgc-interface-rebase branch and add '-mat_type
> seqaijcusparse -fp_pc_factor_mat_solver_type cusparse
> -mat_cusparse_storage_format ell -vec_type cuda'
> to dm/impls/plex/tutorials/ex10.c
>
> The first stage, SNES solve, actually looks OK here. Maybe.
>
> Thanks,
>
> 10:01 mark/feature-xgc-interface-rebase *= ~/petsc$ make -f gmakefile test
> search='dm_impls_plex_tutorials-ex10_0'
> /usr/bin/python /ccs/home/adams/petsc/config/gmakegentest.py
> --petsc-dir=/ccs/home/adams/petsc --petsc-arch=arch-summit-opt64-gnu-cuda
> --testdir=./arch-summit-opt64-gnu-cuda/tests
> Using MAKEFLAGS: search=dm_impls_plex_tutorials-ex10_0
>           CC
> arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10.o
>      CLINKER arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10
>         TEST
> arch-summit-opt64-gnu-cuda/tests/counts/dm_impls_plex_tutorials-ex10_0.counts
>  ok dm_impls_plex_tutorials-ex10_0
> not ok diff-dm_impls_plex_tutorials-ex10_0 # Error code: 1
> #       14,16c14,16
> #       <     0 SNES Function norm 6.184233768573e-04
> #       <     1 SNES Function norm 1.467479466750e-08
> #       <     2 SNES Function norm 7.863111141350e-12
> #       ---
> #       >     0 SNES Function norm 6.184233768572e-04
> #       >     1 SNES Function norm 1.467479466739e-08
> #       >     2 SNES Function norm 7.863102870090e-12
> #       18,31c18,256
> #       <     0 SNES Function norm 6.182952107532e-04
> #       <     1 SNES Function norm 7.336382211149e-09
> #       <     2 SNES Function norm 1.566979901443e-11
> #       <   Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE
> iterations 2
> #       <     0 SNES Function norm 6.183592738545e-04
> #       <     1 SNES Function norm 7.337681407420e-09
> #       <     2 SNES Function norm 1.408823933908e-11
> #       <   Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE
> iterations 2
> #       < [0] TSAdaptChoose_Basic(): Estimated scaled local truncation
> error 0.0396569, accepting step of size 1e-06
> #       < 1 TS dt 1.25e-06 time 1e-06
> #       <   1) species-0: charge density= -1.6024814608984e+01 z-momentum=
>  2.0080682964364e-19 energy=  1.2018000284846e+05
> #       <   1) species-1: charge density=  1.6021676653316e+01 z-momentum=
>  1.4964483981137e-17 energy=  1.2017223215083e+05
> #       <   1) species-2: charge density=  2.8838441139703e-03 z-momentum=
> -1.1062018110807e-23 energy=  1.2019641370376e-03
> #       <         1) Total: charge density= -2.5411155383649e-04,
> momentum=  1.5165279748763e-17, energy=  2.4035223620125e+05 (m_i[0]/m_e =
> 3670.94, 140 cells), 1 sub threads
> #       ---
> #       >     0 SNES Function norm 6.182952107531e-04
> #       >     1 SNES Function norm 6.181600164904e-04
> #       >     2 SNES Function norm 6.180249471739e-04
> #       >     3 SNES Function norm 6.178899987549e-04
>
>
>> I was planning to reorganize the factor code in AIJCUSPARSE in the next
>> days.
>>
>> kl-18967:petsc zampins$ git grep "solver_type cusparse"
>> src/ksp/ksp/examples/tests/ex43.c:      args: -f
>> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_*solver_type
>> cusparse* -mat_cusparse_storage_format ell -vec_type cuda -pc_type ilu
>> src/ksp/ksp/examples/tests/ex43.c:      args: -f
>> ${DATAFILESPATH}/matrices/shallow_water1 -mat_type seqaijcusparse
>> -pc_factor_mat_*solver_type cusparse* -mat_cusparse_storage_format hyb
>> -vec_type cuda -ksp_type cg -pc_type icc
>> src/ksp/ksp/examples/tests/ex43.c:      args: -f
>> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_*solver_type
>> cusparse* -mat_cusparse_storage_format csr -vec_type cuda -ksp_type bicg
>> -pc_type ilu
>> src/ksp/ksp/examples/tests/ex43.c:      args: -f
>> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_*solver_type
>> cusparse* -mat_cusparse_storage_format csr -vec_type cuda -ksp_type bicg
>> -pc_type ilu -pc_factor_mat_ordering_type nd
>> src/ksp/ksp/examples/tutorials/ex46.c:      args: -dm_mat_type
>> aijcusparse -dm_vec_type cuda -random_exact_sol -pc_type ilu -pc_factor_mat_*solver_type
>> cusparse*
>> src/ksp/ksp/examples/tutorials/ex59.c:     args: -subdomain_mat_type
>> aijcusparse -physical_pc_bddc_dirichlet_pc_factor_mat_*solver_type
>> cusparse*
>> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short
>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse*
>> -vec_type cuda -sub_ksp_type preonly -sub_pc_type ilu
>> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short
>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse*
>> -vec_type cuda -sub_ksp_type preonly -sub_pc_type ilu
>> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short
>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse*
>> -vec_type cuda
>> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short
>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse*
>> -vec_type cuda
>> src/ksp/ksp/examples/tutorials/ex71.c:   args: -pde_type Poisson -cells
>> 7,9,8 -dim 3 -ksp_view -pc_bddc_coarse_redundant_pc_type svd
>> -ksp_error_if_not_converged -pc_bddc_dirichlet_pc_type cholesky
>> -pc_bddc_dirichlet_pc_factor_mat_*solver_type cusparse*
>> -pc_bddc_dirichlet_pc_factor_mat_ordering_type nd -pc_bddc_neumann_pc_type
>> cholesky -pc_bddc_neumann_pc_factor_mat_*solver_type cusparse*
>> -pc_bddc_neumann_pc_factor_mat_ordering_type nd -matis_localmat_type
>> aijcusparse
>> src/ksp/ksp/examples/tutorials/ex72.c:      args: -f0
>> ${DATAFILESPATH}/matrices/medium -ksp_monitor_short -ksp_view -mat_view
>> ascii::ascii_info -mat_type aijcusparse -pc_factor_mat_*solver_type
>> cusparse* -pc_type ilu -vec_type cuda
>> src/snes/examples/tutorials/ex12.c:      args: -matis_localmat_type
>> aijcusparse -pc_bddc_dirichlet_pc_factor_mat_*solver_type cusparse*
>> -pc_bddc_neumann_pc_factor_mat_*solver_type cusparse*
>>
>> On Apr 15, 2020, at 2:20 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>> I tried using a serial direct solver in cusparse and got bad numerics:
>>
>> -vector_type cuda -mat_type aijcusparse -pc_factor_mat_solver_type
>> cusparse
>>
>> Before I start debugging this I wanted to see if there are any known
>> issues that I should be aware of.
>>
>> Thanks,
>>
>>
>>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200415/b4a10f2f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch_for_mark
Type: application/octet-stream
Size: 3695 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200415/b4a10f2f/attachment.obj>