<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div><br></div> Please send the full output when you run with the monitors I mentioned turned on. If one approach is converging and one is not then we should be able to see this in differences in the convergence output printed for the two runs getting further and further apart.<div><br></div><div> Barry</div><div><br><div><br><blockquote type="cite"><div>On Oct 31, 2022, at 1:56 AM, Carl-Johan Thore <carl-johan.thore@liu.se> wrote:</div><br class="Apple-interchange-newline"><div><meta charset="UTF-8"><div class="WordSection1" style="page: WordSection1; caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 18px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">The GPU supports double precision and I didn’t explicitly tell PETSc to use float when compiling, so<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">I guess it uses double? What’s the easiest way to check?<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">Barry, running -ksp_view shows that the solver options are the same for CPU and GPU. The only<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">difference is the coarse grid solver for gamg (“the package used to perform factorization:”) which<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">is petsc for CPU and cusparse for GPU. I tried forcing the GPU to use petsc via<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_0_mg_coarse_sub_pc_factor_mat_solver_type, but then ksp failed to converge<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">even on the first topology optimization iteration. <span class="Apple-converted-space"> </span><o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-ksp_view also shows differences in the eigenvalues from the Chebyshev smoother. For example,<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">GPU:<span class="Apple-converted-space"> </span><o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> Down solver (pre-smoother) on level 2 -------------------------------<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> KSP Object: (fieldsplit_0_mg_levels_2_) 1 MPI process<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> type: chebyshev<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> eigenvalue targets used: min 0.109245, max 1.2017<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> eigenvalues provided (min 0.889134, max 1.09245) with<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">CPU:<span class="Apple-converted-space"> </span><o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> eigenvalue targets used: min 0.112623, max 1.23886<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> eigenvalues provided (min 0.879582, max 1.12623)<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">But I guess such differences are expected?<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">/Carl-Johan<o:p></o:p></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div style="border-style: solid none none; border-top-width: 1pt; border-top-color: rgb(225, 225, 225); padding: 3pt 0cm 0cm;"><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><b>From:</b><span class="Apple-converted-space"> </span>Matthew Knepley <<a href="mailto:knepley@gmail.com" style="color: blue; text-decoration: underline;">knepley@gmail.com</a>><span class="Apple-converted-space"> </span><br><b>Sent:</b><span class="Apple-converted-space"> </span>den 30 oktober 2022 22:00<br><b>To:</b><span class="Apple-converted-space"> </span>Barry Smith <<a href="mailto:bsmith@petsc.dev" style="color: blue; text-decoration: underline;">bsmith@petsc.dev</a>><br><b>Cc:</b><span class="Apple-converted-space"> </span>Carl-Johan Thore <<a href="mailto:carl-johan.thore@liu.se" style="color: blue; text-decoration: underline;">carl-johan.thore@liu.se</a>>;<span class="Apple-converted-space"> </span><a href="mailto:petsc-users@mcs.anl.gov" style="color: blue; text-decoration: underline;">petsc-users@mcs.anl.gov</a><br><b>Subject:</b><span class="Apple-converted-space"> </span>Re: [petsc-users] KSP on GPU<o:p></o:p></div></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">On Sun, Oct 30, 2022 at 3:52 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" style="color: blue; text-decoration: underline;">bsmith@petsc.dev</a>> wrote:<o:p></o:p></div></div><div><blockquote style="border-style: none none none solid; border-left-width: 1pt; border-left-color: rgb(204, 204, 204); padding: 0cm 0cm 0cm 6pt; margin-left: 4.8pt; margin-right: 0cm;"><div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> In general you should expect similar but not identical conference behavior. <o:p></o:p></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> I suggest running with all the monitoring you can. -ksp_monitor_true_residual -fieldsplit_0_monitor_true_residual -fieldsplit_1_monitor_true_residual and compare the various convergence between the CPU and GPU. Also run with -ksp_view and check that the various solver options are the same (they should be).<o:p></o:p></div></div></div></blockquote><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">Is the GPU using float or double?<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> Matt<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> <o:p></o:p></div></div><blockquote style="border-style: none none none solid; border-left-width: 1pt; border-left-color: rgb(204, 204, 204); padding: 0cm 0cm 0cm 6pt; margin-left: 4.8pt; margin-right: 0cm;"><div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> Barry<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><br><br><o:p></o:p></div><blockquote style="margin-top: 5pt; margin-bottom: 5pt;"><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">On Oct 30, 2022, at 11:02 AM, Carl-Johan Thore via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank" style="color: blue; text-decoration: underline;">petsc-users@mcs.anl.gov</a>> wrote:<o:p></o:p></div></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div><div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">Hi,<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> <o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">I'm solving a topology optimization problem with Stokes flow discretized by a stabilized Q1-Q0 finite element method<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">and using BiCGStab with the fieldsplit preconditioner to solve the linear systems. The implementation<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">is based on DMStag, runs on Ubuntu via WSL2, and works fine with PETSc-3.18.1 on multiple CPU cores and the following<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">options for the preconditioner:<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> <o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_0_ksp_type preonly \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_0_pc_type gamg \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_0_pc_gamg_reuse_interpolation 0 \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_1_ksp_type preonly \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_1_pc_type jacobi <o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> <o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">However, when I enable GPU computations by adding two options -<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> <o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">...<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-dm_vec_type cuda \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-dm_mat_type aijcusparse \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_0_ksp_type preonly \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_0_pc_type gamg \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_0_pc_gamg_reuse_interpolation 0 \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_1_ksp_type preonly \<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">-fieldsplit_1_pc_type jacobi <o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> <o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">- KSP still works fine the first couple of topology optimization iterations but then<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">stops with "Linear solve did not converge due to DIVERGED_DTOL ..".<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> <o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">My question is whether I should expect the GPU versions of the linear solvers and pre-conditioners<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">to function exactly as their CPU counterparts (I got this impression from the documentation),<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">in which case I've probably made some mistake in my own code, or whether there are other/additional<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">settings or modifications I should use to run on the GPU (an NVIDIA Quadro T2000)?<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"> <o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><span lang="SV">Kind regards,</span><o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><span lang="SV"> </span><o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><span lang="SV">Carl-Johan</span><o:p></o:p></div></div></div></div></blockquote></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div></div></div></blockquote></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><br clear="all"><o:p></o:p></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div></div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">--<span class="Apple-converted-space"> </span><o:p></o:p></div><div><div><div><div><div><div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener<o:p></o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div></div><div><div style="margin: 0cm; font-size: 11pt; font-family: Calibri, sans-serif;"><a href="https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=05%7C01%7Ccarl-johan.thore%40liu.se%7C8113f968516a44cce6ba08dabab9b28e%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638027603971893036%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gQ45cdsgo404NeBzZ7e6c5zNXhYOy39ZPfZzqtGDaDk%3D&reserved=0" target="_blank" style="color: blue; text-decoration: underline;">https://www.cse.buffalo.edu/~knepley/</a></div></div></div></div></div></div></div></div></div></div></div></blockquote></div><br></div></body></html>