<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Apr 15, 2020, at 10:14 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" class="">mfadams@lbl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Thanks, it looks correct. I am getting memory leaks (appended)<div class=""><br class=""></div><div class="">And something horrible is going on with performance:</div><div class=""><br class=""></div><div class="">MatLUFactorNum       130 1.0 9.2220e+00 1.0 6.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 30  0  0  0  0  30  0  0  0  0    71       0    390 3.33e+02    0 0.00e+00  0<br class=""></div><div class=""><br class=""></div><div class="">MatLUFactorNum       130 1.0 6.5177e-01 1.0 1.28e+09 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0  0  1966       0      0 0.00e+00    0 0.00e+00  0<br class=""></div><div class=""><br class=""></div></div></div></blockquote><div><br class=""></div><div>Can you describe these numbers? It seems that in the second case the factorization is run on the CPU (as I explained in my previous message)</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">This is not urgent, but I'd like to get a serial LU GPU solver at some point.</div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">Thanks again,</div><div class="">Mark</div><div class=""><br class=""></div><div class="">Lots of these:</div><div class="">[ 0]32 bytes VecCUDAAllocateCheck() line 34 in /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/<a href="http://veccuda2.cu/" class="">veccuda2.cu</a><br class="">[ 0]32 bytes VecCUDAAllocateCheck() line 34 in /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/<a href="http://veccuda2.cu/" class="">veccuda2.cu</a><br class="">[ 0]32 bytes VecCUDAAllocateCheck() line 34 in /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/<a href="http://veccuda2.cu/" class="">veccuda2.cu</a><br class=""></div></div><br class=""></div></blockquote><div><br class=""></div><div>Yes, as I said, the code is in bad shape. I’ll see what I can do.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 15, 2020 at 12:47 PM Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com" class="">stefano.zampini@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="">Mark<div class=""><br class=""></div><div class="">attached is the patch. I will open an MR in the next days if you confirm it is working for you</div><div class="">The issue is that CUSPARSE does not have a way to compute the triangular factors, so we demand the computation of the factors to PETSc (CPU). These factors are then copied to the GPU.</div><div class="">What was happening in the second step of SNES, was that the factors were never updated since the offloadmask was never updated.</div><div class=""><br class=""></div><div class="">The real issue is that the CUSPARSE support in PETSc is really in bad shape and mostly untested, with coding solutions that are probably outdated now.</div><div class="">I'll see what I can do to fix the class if I have time in the next weeks.</div><div class=""><br class=""></div><div class="">Stefano</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno mer 15 apr 2020 alle ore 17:21 Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank" class="">mfadams@lbl.gov</a>> ha scritto:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div dir="ltr" class=""><br class=""></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 15, 2020 at 8:24 AM Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com" target="_blank" class="">stefano.zampini@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">Mark<div class=""><br class=""></div><div class="">I have fixed few things in the solver and it is tested with the current master.</div></div></blockquote><div class=""><br class=""></div><div class="">I rebased with master over the weekend ....</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="">Can you write a MWE to reproduce the issue? Which version of CUDA and CUSPARSE are you using?</div></div></blockquote><div class=""><br class=""></div><div class="">You can use mark/feature-xgc-interface-rebase branch and add '-mat_type seqaijcusparse -fp_pc_factor_mat_solver_type cusparse -mat_cusparse_storage_format ell -vec_type cuda' to dm/impls/plex/tutorials/ex10.c</div><div class=""><br class=""></div><div class="">The first stage, SNES solve, actually looks OK here. Maybe.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class=""><br class=""></div><div class="">10:01 mark/feature-xgc-interface-rebase *= ~/petsc$ make -f gmakefile test search='dm_impls_plex_tutorials-ex10_0'<br class="">/usr/bin/python /ccs/home/adams/petsc/config/gmakegentest.py --petsc-dir=/ccs/home/adams/petsc --petsc-arch=arch-summit-opt64-gnu-cuda --testdir=./arch-summit-opt64-gnu-cuda/tests<br class="">Using MAKEFLAGS: search=dm_impls_plex_tutorials-ex10_0<br class="">          CC arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10.o<br class="">     CLINKER arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10<br class="">        TEST arch-summit-opt64-gnu-cuda/tests/counts/dm_impls_plex_tutorials-ex10_0.counts<br class=""> ok dm_impls_plex_tutorials-ex10_0<br class="">not ok diff-dm_impls_plex_tutorials-ex10_0 # Error code: 1<br class="">#       14,16c14,16<br class="">#       <     0 SNES Function norm 6.184233768573e-04 <br class="">#       <     1 SNES Function norm 1.467479466750e-08 <br class="">#       <     2 SNES Function norm 7.863111141350e-12 <br class="">#       ---<br class="">#       >     0 SNES Function norm 6.184233768572e-04 <br class="">#       >     1 SNES Function norm 1.467479466739e-08 <br class="">#       >     2 SNES Function norm 7.863102870090e-12 <br class="">#       18,31c18,256<br class="">#       <     0 SNES Function norm 6.182952107532e-04 <br class="">#       <     1 SNES Function norm 7.336382211149e-09 <br class="">#       <     2 SNES Function norm 1.566979901443e-11 <br class="">#       <   Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE iterations 2<br class="">#       <     0 SNES Function norm 6.183592738545e-04 <br class="">#       <     1 SNES Function norm 7.337681407420e-09 <br class="">#       <     2 SNES Function norm 1.408823933908e-11 <br class="">#       <   Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE iterations 2<br class="">#       < [0] TSAdaptChoose_Basic(): Estimated scaled local truncation error 0.0396569, accepting step of size 1e-06<br class="">#       < 1 TS dt 1.25e-06 time 1e-06<br class="">#       <   1) species-0: charge density= -1.6024814608984e+01 z-momentum=  2.0080682964364e-19 energy=  1.2018000284846e+05<br class="">#       <   1) species-1: charge density=  1.6021676653316e+01 z-momentum=  1.4964483981137e-17 energy=  1.2017223215083e+05<br class="">#       <   1) species-2: charge density=  2.8838441139703e-03 z-momentum= -1.1062018110807e-23 energy=  1.2019641370376e-03<br class="">#       <         1) Total: charge density= -2.5411155383649e-04, momentum=  1.5165279748763e-17, energy=  2.4035223620125e+05 (m_i[0]/m_e = 3670.94, 140 cells), 1 sub threads<br class="">#       ---<br class=""><font color="#ff0000" class="">#       >     0 SNES Function norm 6.182952107531e-04 <br class="">#       >     1 SNES Function norm 6.181600164904e-04 <br class="">#       >     2 SNES Function norm 6.180249471739e-04 <br class="">#       >     3 SNES Function norm 6.178899987549e-04 </font><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="">I was planning to reorganize the factor code in AIJCUSPARSE in the next days.</div><div class=""><br class=""></div><div class=""><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><span style="font-variant-ligatures:no-common-ligatures" class=""><font size="1" class="">kl-18967:petsc zampins$ git grep "solver_type cusparse"</font></span></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tests/ex43.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -f ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -mat_cusparse_storage_format ell -vec_type cuda -pc_type ilu</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tests/ex43.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -f ${DATAFILESPATH}/matrices/shallow_water1 -mat_type seqaijcusparse -pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -mat_cusparse_storage_format hyb -vec_type cuda -ksp_type cg -pc_type icc</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tests/ex43.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -f ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -mat_cusparse_storage_format csr -vec_type cuda -ksp_type bicg -pc_type ilu</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tests/ex43.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -f ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse -pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -mat_cusparse_storage_format csr -vec_type cuda -ksp_type bicg -pc_type ilu -pc_factor_mat_ordering_type nd</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tutorials/ex46.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -dm_mat_type aijcusparse -dm_vec_type cuda -random_exact_sol -pc_type ilu -pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tutorials/ex59.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">     args: -subdomain_mat_type aijcusparse -physical_pc_bddc_dirichlet_pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tutorials/ex7.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -ksp_monitor_short -mat_type aijcusparse -sub_pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -vec_type cuda -sub_ksp_type preonly -sub_pc_type ilu</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tutorials/ex7.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -ksp_monitor_short -mat_type aijcusparse -sub_pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -vec_type cuda -sub_ksp_type preonly -sub_pc_type ilu</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tutorials/ex7.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -ksp_monitor_short -mat_type aijcusparse -sub_pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -vec_type cuda</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tutorials/ex7.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -ksp_monitor_short -mat_type aijcusparse -sub_pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -vec_type cuda</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tutorials/ex71.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">   args: -pde_type Poisson -cells 7,9,8 -dim 3 -ksp_view -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged -pc_bddc_dirichlet_pc_type cholesky -pc_bddc_dirichlet_pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -pc_bddc_dirichlet_pc_factor_mat_ordering_type nd -pc_bddc_neumann_pc_type cholesky -pc_bddc_neumann_pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -pc_bddc_neumann_pc_factor_mat_ordering_type nd -matis_localmat_type aijcusparse</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/ksp/ksp/examples/tutorials/ex72.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -f0 ${DATAFILESPATH}/matrices/medium -ksp_monitor_short -ksp_view -mat_view ascii::ascii_info -mat_type aijcusparse -pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -pc_type ilu -vec_type cuda</span></font></div><div style="margin:0px;font-stretch:normal;line-height:normal;font-family:Menlo" class=""><font size="1" class=""><span style="font-variant-ligatures:no-common-ligatures" class="">src/snes/examples/tutorials/ex12.c</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(46,174,187)" class="">:</span><span style="font-variant-ligatures:no-common-ligatures" class="">      args: -matis_localmat_type aijcusparse -pc_bddc_dirichlet_pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span><span style="font-variant-ligatures:no-common-ligatures" class=""> -pc_bddc_neumann_pc_factor_mat_</span><span style="font-variant-ligatures:no-common-ligatures;color:rgb(180,36,25)" class=""><b class="">solver_type cusparse</b></span></font></div><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Apr 15, 2020, at 2:20 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank" class="">mfadams@lbl.gov</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class="">I tried using a serial direct solver in cusparse and got bad numerics:<div class=""><br class=""></div><div class="">-vector_type cuda -mat_type aijcusparse -pc_factor_mat_solver_type cusparse <br class=""></div><div class=""><br class=""></div><div class="">Before I start debugging this I wanted to see if there are any known issues that I should be aware of.</div><div class=""><br class=""></div><div class="">Thanks,</div></div>
</div></blockquote></div><br class=""></div></div></blockquote></div></div>
</blockquote></div><br clear="all" class=""><div class=""><br class=""></div>-- <br class=""><div dir="ltr" class="">Stefano</div>
</blockquote></div>
</div></blockquote></div><br class=""></body></html>