[petsc-users] CUDA error
Mark Adams
mfadams at lbl.gov
Wed Apr 15 09:20:45 CDT 2020
On Wed, Apr 15, 2020 at 8:24 AM Stefano Zampini <stefano.zampini at gmail.com>
wrote:
> Mark
>
> I have fixed few things in the solver and it is tested with the current
> master.
>
I rebased with master over the weekend ....
> Can you write a MWE to reproduce the issue? Which version of CUDA and
> CUSPARSE are you using?
>
You can use mark/feature-xgc-interface-rebase branch and add '-mat_type
seqaijcusparse -fp_pc_factor_mat_solver_type cusparse
-mat_cusparse_storage_format ell -vec_type cuda'
to dm/impls/plex/tutorials/ex10.c
The first stage, SNES solve, actually looks OK here. Maybe.
Thanks,
10:01 mark/feature-xgc-interface-rebase *= ~/petsc$ make -f gmakefile test
search='dm_impls_plex_tutorials-ex10_0'
/usr/bin/python /ccs/home/adams/petsc/config/gmakegentest.py
--petsc-dir=/ccs/home/adams/petsc --petsc-arch=arch-summit-opt64-gnu-cuda
--testdir=./arch-summit-opt64-gnu-cuda/tests
Using MAKEFLAGS: search=dm_impls_plex_tutorials-ex10_0
CC arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10.o
CLINKER arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10
TEST
arch-summit-opt64-gnu-cuda/tests/counts/dm_impls_plex_tutorials-ex10_0.counts
ok dm_impls_plex_tutorials-ex10_0
not ok diff-dm_impls_plex_tutorials-ex10_0 # Error code: 1
# 14,16c14,16
# < 0 SNES Function norm 6.184233768573e-04
# < 1 SNES Function norm 1.467479466750e-08
# < 2 SNES Function norm 7.863111141350e-12
# ---
# > 0 SNES Function norm 6.184233768572e-04
# > 1 SNES Function norm 1.467479466739e-08
# > 2 SNES Function norm 7.863102870090e-12
# 18,31c18,256
# < 0 SNES Function norm 6.182952107532e-04
# < 1 SNES Function norm 7.336382211149e-09
# < 2 SNES Function norm 1.566979901443e-11
# < Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE
iterations 2
# < 0 SNES Function norm 6.183592738545e-04
# < 1 SNES Function norm 7.337681407420e-09
# < 2 SNES Function norm 1.408823933908e-11
# < Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE
iterations 2
# < [0] TSAdaptChoose_Basic(): Estimated scaled local truncation
error 0.0396569, accepting step of size 1e-06
# < 1 TS dt 1.25e-06 time 1e-06
# < 1) species-0: charge density= -1.6024814608984e+01 z-momentum=
2.0080682964364e-19 energy= 1.2018000284846e+05
# < 1) species-1: charge density= 1.6021676653316e+01 z-momentum=
1.4964483981137e-17 energy= 1.2017223215083e+05
# < 1) species-2: charge density= 2.8838441139703e-03 z-momentum=
-1.1062018110807e-23 energy= 1.2019641370376e-03
# < 1) Total: charge density= -2.5411155383649e-04, momentum=
1.5165279748763e-17, energy= 2.4035223620125e+05 (m_i[0]/m_e = 3670.94,
140 cells), 1 sub threads
# ---
# > 0 SNES Function norm 6.182952107531e-04
# > 1 SNES Function norm 6.181600164904e-04
# > 2 SNES Function norm 6.180249471739e-04
# > 3 SNES Function norm 6.178899987549e-04
> I was planning to reorganize the factor code in AIJCUSPARSE in the next
> days.
>
> kl-18967:petsc zampins$ git grep "solver_type cusparse"
> src/ksp/ksp/examples/tests/ex43.c: args: -f
> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_*solver_type
> cusparse* -mat_cusparse_storage_format ell -vec_type cuda -pc_type ilu
> src/ksp/ksp/examples/tests/ex43.c: args: -f
> ${DATAFILESPATH}/matrices/shallow_water1 -mat_type seqaijcusparse
> -pc_factor_mat_*solver_type cusparse* -mat_cusparse_storage_format hyb
> -vec_type cuda -ksp_type cg -pc_type icc
> src/ksp/ksp/examples/tests/ex43.c: args: -f
> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_*solver_type
> cusparse* -mat_cusparse_storage_format csr -vec_type cuda -ksp_type bicg
> -pc_type ilu
> src/ksp/ksp/examples/tests/ex43.c: args: -f
> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_*solver_type
> cusparse* -mat_cusparse_storage_format csr -vec_type cuda -ksp_type bicg
> -pc_type ilu -pc_factor_mat_ordering_type nd
> src/ksp/ksp/examples/tutorials/ex46.c: args: -dm_mat_type
> aijcusparse -dm_vec_type cuda -random_exact_sol -pc_type ilu -pc_factor_mat_*solver_type
> cusparse*
> src/ksp/ksp/examples/tutorials/ex59.c: args: -subdomain_mat_type
> aijcusparse -physical_pc_bddc_dirichlet_pc_factor_mat_*solver_type
> cusparse*
> src/ksp/ksp/examples/tutorials/ex7.c: args: -ksp_monitor_short
> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse* -vec_type
> cuda -sub_ksp_type preonly -sub_pc_type ilu
> src/ksp/ksp/examples/tutorials/ex7.c: args: -ksp_monitor_short
> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse* -vec_type
> cuda -sub_ksp_type preonly -sub_pc_type ilu
> src/ksp/ksp/examples/tutorials/ex7.c: args: -ksp_monitor_short
> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse* -vec_type
> cuda
> src/ksp/ksp/examples/tutorials/ex7.c: args: -ksp_monitor_short
> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse* -vec_type
> cuda
> src/ksp/ksp/examples/tutorials/ex71.c: args: -pde_type Poisson -cells
> 7,9,8 -dim 3 -ksp_view -pc_bddc_coarse_redundant_pc_type svd
> -ksp_error_if_not_converged -pc_bddc_dirichlet_pc_type cholesky
> -pc_bddc_dirichlet_pc_factor_mat_*solver_type cusparse*
> -pc_bddc_dirichlet_pc_factor_mat_ordering_type nd -pc_bddc_neumann_pc_type
> cholesky -pc_bddc_neumann_pc_factor_mat_*solver_type cusparse*
> -pc_bddc_neumann_pc_factor_mat_ordering_type nd -matis_localmat_type
> aijcusparse
> src/ksp/ksp/examples/tutorials/ex72.c: args: -f0
> ${DATAFILESPATH}/matrices/medium -ksp_monitor_short -ksp_view -mat_view
> ascii::ascii_info -mat_type aijcusparse -pc_factor_mat_*solver_type
> cusparse* -pc_type ilu -vec_type cuda
> src/snes/examples/tutorials/ex12.c: args: -matis_localmat_type
> aijcusparse -pc_bddc_dirichlet_pc_factor_mat_*solver_type cusparse*
> -pc_bddc_neumann_pc_factor_mat_*solver_type cusparse*
>
> On Apr 15, 2020, at 2:20 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> I tried using a serial direct solver in cusparse and got bad numerics:
>
> -vector_type cuda -mat_type aijcusparse -pc_factor_mat_solver_type
> cusparse
>
> Before I start debugging this I wanted to see if there are any known
> issues that I should be aware of.
>
> Thanks,
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200415/c96eea3e/attachment.html>
More information about the petsc-users
mailing list