<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Thanks for the tips!<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The suggested settings for GAMG did not yield better results,
<o:p></o:p></p>
<p class="MsoNormal">but hypre worked well right away, giving very good convergence!<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">A follow-up question then (I hope that’s ok; and it could be related to GAMG<o:p></o:p></p>
<p class="MsoNormal">not working, I’ll check that). Once everything was running I discovered that my gradient vector
<o:p></o:p></p>
<p class="MsoNormal">dfdx which I populate via an array df obtained from VecGetArray(dfdx, &df) doesn’t get
<o:p></o:p></p>
<p class="MsoNormal">filled properly; it always contains only zeros. This is not the case when I run on the CPU,<o:p></o:p></p>
<p class="MsoNormal">and df gets filled as it should even on the GPU, suggesting that either I’m not using
<o:p></o:p></p>
<p class="MsoNormal">VecGetArray properly, or I shouldn’t use it at all for GPU computations?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Kind regards,<o:p></o:p></p>
<p class="MsoNormal">Carl-Johan<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b>From:</b> Mark Adams <mfadams@lbl.gov> <br>
<b>Sent:</b> den 31 oktober 2022 13:30<br>
<b>To:</b> Carl-Johan Thore <carl-johan.thore@liu.se><br>
<b>Cc:</b> Matthew Knepley <knepley@gmail.com>; Barry Smith <bsmith@petsc.dev>; petsc-users@mcs.anl.gov<br>
<b>Subject:</b> Re: [petsc-users] KSP on GPU<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">* You could try hypre or another preconditioner that you can afford, like LU or ASM, that works.<o:p></o:p></p>
<div>
<p class="MsoNormal">* If this matrix is SPD, you want to use -fieldsplit_0_pc_gamg_esteig_ksp_type cg -fieldsplit_0_pc_gamg_esteig_ksp_max_it 10<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> These will give better eigen estimates, and that is important.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">The differences between these steimates is not too bad.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">There is a safety factor (1.05 is the default) that you could increase with: -fieldsplit_0_mg_levels_ksp_chebyshev_esteig 0,0.05,0,<b><i><u>1.1</u></i></b><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">* Finally you could try -fieldsplit_0_pc_gamg_reuse_interpolation 1, if GAMG is still not working.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Use -fieldsplit_0_ksp_converged_reason and check the iteration count. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">And it is a good idea to check with hypre to make sure something is not going badly in terms of performance anyway. AMG is hard and hypre is a good solver.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Mark<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Mon, Oct 31, 2022 at 1:56 AM Carl-Johan Thore via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">The GPU supports double precision and I didn’t explicitly tell PETSc to use float when compiling, so<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">I guess it uses double? What’s the easiest way to check?<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Barry, running -ksp_view shows that the solver options are the same for CPU and GPU. The only
<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">difference is the coarse grid solver for gamg (“the package used to perform factorization:”) which<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">is petsc for CPU and cusparse for GPU. I tried forcing the GPU to use petsc via<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_0_mg_coarse_sub_pc_factor_mat_solver_type, but then ksp failed to converge<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">even on the first topology optimization iteration.
<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-ksp_view also shows differences in the eigenvalues from the Chebyshev smoother. For example,<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GPU:
<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> Down solver (pre-smoother) on level 2 -------------------------------<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> KSP Object: (fieldsplit_0_mg_levels_2_) 1 MPI process<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> type: chebyshev<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> eigenvalue targets used: min 0.109245, max 1.2017<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> eigenvalues provided (min 0.889134, max 1.09245) with<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">CPU:
<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> eigenvalue targets used: min 0.112623, max 1.23886<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> eigenvalues provided (min 0.879582, max 1.12623)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">But I guess such differences are expected?<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">/Carl-Johan<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>From:</b> Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>>
<br>
<b>Sent:</b> den 30 oktober 2022 22:00<br>
<b>To:</b> Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>><br>
<b>Cc:</b> Carl-Johan Thore <<a href="mailto:carl-johan.thore@liu.se" target="_blank">carl-johan.thore@liu.se</a>>;
<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a><br>
<b>Subject:</b> Re: [petsc-users] KSP on GPU<o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">On Sun, Oct 30, 2022 at 3:52 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> wrote:<o:p></o:p></p>
</div>
<div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> In general you should expect similar but not identical conference behavior. <o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> I suggest running with all the monitoring you can. -ksp_monitor_true_residual -fieldsplit_0_monitor_true_residual -fieldsplit_1_monitor_true_residual and compare the various
convergence between the CPU and GPU. Also run with -ksp_view and check that the various solver options are the same (they should be).<o:p></o:p></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Is the GPU using float or double?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> Matt<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> Barry<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt"><o:p> </o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">On Oct 30, 2022, at 11:02 AM, Carl-Johan Thore via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hi,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">I'm solving a topology optimization problem with Stokes flow discretized by a stabilized Q1-Q0 finite element method<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">and using BiCGStab with the fieldsplit preconditioner to solve the linear systems. The implementation<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">is based on DMStag, runs on Ubuntu via WSL2, and works fine with PETSc-3.18.1 on multiple CPU cores and the following<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">options for the preconditioner:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_0_ksp_type preonly \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_0_pc_type gamg \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_0_pc_gamg_reuse_interpolation 0 \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_1_ksp_type preonly \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_1_pc_type jacobi <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">However, when I enable GPU computations by adding two options -<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">...<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-dm_vec_type cuda \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-dm_mat_type aijcusparse \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_0_ksp_type preonly \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_0_pc_type gamg \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_0_pc_gamg_reuse_interpolation 0 \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_1_ksp_type preonly \<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-fieldsplit_1_pc_type jacobi <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">- KSP still works fine the first couple of topology optimization iterations but then<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">stops with "Linear solve did not converge due to DIVERGED_DTOL ..".<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">My question is whether I should expect the GPU versions of the linear solvers and pre-conditioners<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">to function exactly as their CPU counterparts (I got this impression from the documentation),<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">in which case I've probably made some mistake in my own code, or whether there are other/additional<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">settings or modifications I should use to run on the GPU (an NVIDIA Quadro T2000)?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="SV">Kind regards,</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="SV"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="SV">Carl-Johan</span><o:p></o:p></p>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><br clear="all">
<o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">--
<o:p></o:p></p>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><a href="https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=05%7C01%7Ccarl-johan.thore%40liu.se%7C88e88d9edbf5436ced1c08dabb3bba29%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638028162449727533%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=E403%2B2DscOzU8UM2HMxEp5iVR3HOl3Atv4o6CF1WGQg%3D&reserved=0" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</body>
</html>