<!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">
Hello all, I am writing to ask if PETSc’s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. The questions appeared when I was running a large numerical program based on
</div>
<!-- Preheader Text : END -->
<!-- Email Banner : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>
<!--[if ((ie)|(mso))]>
<table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" ><tr><td>
<table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #90a4ae;background-color:#D0D8DC;"><tr><td valign="top">
<table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
<tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
This Message Is From an External Sender
</td></tr>
<tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
This message came from outside your organization.
</td></tr>
</table>
</td></tr></table>
</td></tr></table>
<![endif]-->
<![if !((ie)|(mso))]>
<div dir="ltr" id="pfptBanneroag8g7x" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #D0D8DC !important; background-color: #D0D8DC; border-top: 4px solid #90a4ae !important; border-top: 4px solid #90a4ae;">
<div id="pfptBanneroag8g7x" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
<div id="pfptBanneroag8g7x" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-weight:bold !important; font-weight:bold; font-size:14px !important; line-height:18px !important; line-height:18px">
This Message Is From an External Sender
</div>
<div id="pfptBanneroag8g7x" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-weight:normal; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-size:12px !important; line-height:18px !important; line-height:18px; margin-top:2px !important;">
This message came from outside your organization.
</div>
</div>
<div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important; height: 0px"> </div>
</div>
<![endif]>
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->
<!-- BaNnErBlUrFlE-BoDy-end -->
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
#pfptBanneroag8g7x { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important;
background-color: #D0D8DC !important;
max-width: none !important; max-height: none !important }
.pfptPrimaryButtonoag8g7x:hover, .pfptPrimaryButtonoag8g7x:focus {
background-color: #b4c1c7 !important; }
.pfptPrimaryButtonoag8g7x:active {
background-color: #90a4ae !important; }
</style>
<!-- BaNnErBlUrFlE-HeAdEr-end -->
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"Helvetica Neue";
panose-1:2 0 5 3 0 0 0 2 0 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:12.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Aptos",sans-serif;
color:windowtext;}
p.p2, li.p2, div.p2
{mso-style-name:p2;
margin:0cm;
font-size:10.0pt;
font-family:"Helvetica Neue";}
p.p3, li.p3, div.p3
{mso-style-name:p3;
margin:0cm;
font-size:10.0pt;
font-family:"Helvetica Neue";}
p.p4, li.p4, div.p4
{mso-style-name:p4;
margin:0cm;
font-size:10.0pt;
font-family:"Helvetica Neue";}
span.apple-converted-space
{mso-style-name:apple-converted-space;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="en-CN" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="p3">Hello all,<o:p></o:p></p>
<p class="p2"><o:p> </o:p></p>
<p class="p3">I am writing to ask if PETSc’s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm.<o:p></o:p></p>
<p class="p2"><o:p> </o:p></p>
<p class="p3">The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled
by the <b>OPENBLAS_NUM_THREADS</b> environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread.<br>
<br>
Could you please <b>confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK?</b>
<b>If so</b>, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES?
<b>If not</b>, why do I observe that threads are being used during the GMERS solutions?<o:p></o:p></p>
<p class="p2"><o:p> </o:p></p>
<p class="p3"><b>For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows:</b><o:p></o:p></p>
<p class="p3"><br>
./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade
--with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp<br>
<br>
To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code
is as follows:<br>
<br>
#include <petscksp.h><br>
<br>
/* Monitor function to print iteration number and residual norm */<br>
PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) {<br>
<span class="apple-converted-space"> </span>PetscErrorCode ierr;<br>
<span class="apple-converted-space"> </span>ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm);<br>
<span class="apple-converted-space"> </span>CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>return 0;<br>
}<br>
<br>
int main(int argc, char **args) {<br>
<span class="apple-converted-space"> </span>Vec x, b, x_true, e;<br>
<span class="apple-converted-space"> </span>Mat A;<br>
<span class="apple-converted-space"> </span>KSP ksp;<br>
<span class="apple-converted-space"> </span>PetscErrorCode ierr;<br>
<span class="apple-converted-space"> </span>PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n<br>
<span class="apple-converted-space"> </span>PetscInt Istart, Iend, ncols;<br>
<span class="apple-converted-space"> </span>PetscScalar v;<br>
<span class="apple-converted-space"> </span>PetscMPIInt rank;<br>
<span class="apple-converted-space"> </span>PetscInitialize(&argc, &args, NULL, NULL);<br>
<span class="apple-converted-space"> </span>PetscLogDouble t1, t2; <span class="apple-converted-space">
</span>// Variables for timing<br>
<span class="apple-converted-space"> </span>MPI_Comm_rank(PETSC_COMM_WORLD, &rank);<br>
<br>
<span class="apple-converted-space"> </span>// Create vectors and matrix<br>
<span class="apple-converted-space"> </span>ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDuplicate(x, &b); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Set true solution as all ones<br>
<span class="apple-converted-space"> </span>ierr = VecSet(x_true, 1.0); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Create and assemble matrix A for the 2D Laplacian using 5-point stencil<br>
<span class="apple-converted-space"> </span>ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatSetFromOptions(A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatSetUp(A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>for (Ii = Istart; Ii < Iend; Ii++) {<br>
<span class="apple-converted-space"> </span>i = Ii / n; // Row index<br>
<span class="apple-converted-space"> </span>j = Ii % n; // Column index<br>
<span class="apple-converted-space"> </span>v = -4.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>if (i > 0) { // South<br>
<span class="apple-converted-space"> </span>J = Ii - n;<br>
<span class="apple-converted-space"> </span>v = 1.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>if (i < n - 1) { // North<br>
<span class="apple-converted-space"> </span>J = Ii + n;<br>
<span class="apple-converted-space"> </span>v = 1.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>if (j > 0) { // West<br>
<span class="apple-converted-space"> </span>J = Ii - 1;<br>
<span class="apple-converted-space"> </span>v = 1.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>if (j < n - 1) { // East<br>
<span class="apple-converted-space"> </span>J = Ii + 1;<br>
<span class="apple-converted-space"> </span>v = 1.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Compute the RHS corresponding to the true solution<br>
<span class="apple-converted-space"> </span>ierr = MatMult(A, x_true, b); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Set up and solve the linear system<br>
<span class="apple-converted-space"> </span>ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>/* Set up the monitor */<br>
<span class="apple-converted-space"> </span>ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Start timing<br>
<span class="apple-converted-space"> </span>PetscTime(&t1);<br>
<br>
<span class="apple-converted-space"> </span>ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Stop timing<br>
<span class="apple-converted-space"> </span>PetscTime(&t2);<br>
<br>
<span class="apple-converted-space"> </span>// Compute error<br>
<span class="apple-converted-space"> </span>ierr = VecDuplicate(x, &e); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>PetscReal norm_error, norm_true;<br>
<span class="apple-converted-space"> </span>ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>PetscReal relative_error = norm_error / norm_true;<br>
<span class="apple-converted-space"> </span>if (rank == 0) { // Print only from the first MPI process<br>
<span class="apple-converted-space"> </span>PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error);<br>
<span class="apple-converted-space"> </span>}<br>
<br>
<span class="apple-converted-space"> </span>// Output the wall time taken for MatMult<br>
<span class="apple-converted-space"> </span>PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1);<br>
<br>
<span class="apple-converted-space"> </span>// Cleanup<br>
<span class="apple-converted-space"> </span>ierr = VecDestroy(&x); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDestroy(&b); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDestroy(&x_true); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDestroy(&e); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatDestroy(&A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = KSPDestroy(&ksp); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>PetscFinalize();<br>
<span class="apple-converted-space"> </span>return 0;<br>
}<br>
<br>
Here are some profiling results for GMERS solution.<br>
<br>
OPENBLAS_NUM_THREADS = 1, iteration steps<span class="apple-converted-space"> </span>
= 859, solution time = 16.1<br>
OPENBLAS_NUM_THREADS = 2, iteration steps<span class="apple-converted-space"> </span>
= 859, solution time = 16.3<o:p></o:p></p>
<p class="p3">OPENBLAS_NUM_THREADS = 4, iteration steps<span class="apple-converted-space">
</span>= 859, solution time = 16.7<o:p></o:p></p>
<p class="p3">OPENBLAS_NUM_THREADS = 8, iteration steps<span class="apple-converted-space">
</span>= 859, solution time = 16.8<o:p></o:p></p>
<p class="p3">OPENBLAS_NUM_THREADS = 16, iteration steps<span class="apple-converted-space">
</span>= 859, solution time = 17.8<o:p></o:p></p>
<p class="p3"><br>
<b>I am using one workstation with Intel® Core™ i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong.</b><o:p></o:p></p>
<p class="p4"><o:p> </o:p></p>
<p class="p3">Thank you in advance!<br>
<br>
Sincerely,<o:p></o:p></p>
<p class="p3">Yongzhong<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-ligatures:none">-----------------------------------------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-ligatures:none">Yongzhong Li</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-ligatures:none"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-ligatures:none">PhD student | Electromagnetics Group<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-ligatures:none">Department of Electrical & Computer Engineering<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-ligatures:none">University of Toronto<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-ligatures:none"><a href="https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!dke3wyns7hkBOVhDpSLeiXntYbY-XnVkGEyNyesUI7XkVcBQe_oSaykAmlN6EZ_B9P4mbm-aP1RKw2FwERcONEC88CYksHvjUZ0$"><span style="color:#0563C1">http://www.modelics.org</span></a><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>