<!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">
Hi Barry, Thank you for the information provided! Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc? I am now using OpenBLAS but didn’t see much improvement when
</div>
<!-- Preheader Text : END -->
<!-- Email Banner : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>
<!--[if ((ie)|(mso))]>
<table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" ><tr><td>
<table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #90a4ae;background-color:#D0D8DC;"><tr><td valign="top">
<table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
<tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
This Message Is From an External Sender
</td></tr>
<tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
This message came from outside your organization.
</td></tr>
</table>
</td></tr></table>
</td></tr></table>
<![endif]-->
<![if !((ie)|(mso))]>
<div dir="ltr" id="pfptBannerpoircn7" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #D0D8DC !important; background-color: #D0D8DC; border-top: 4px solid #90a4ae !important; border-top: 4px solid #90a4ae;">
<div id="pfptBannerpoircn7" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
<div id="pfptBannerpoircn7" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-weight:bold !important; font-weight:bold; font-size:14px !important; line-height:18px !important; line-height:18px">
This Message Is From an External Sender
</div>
<div id="pfptBannerpoircn7" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-weight:normal; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-size:12px !important; line-height:18px !important; line-height:18px; margin-top:2px !important;">
This message came from outside your organization.
</div>
</div>
<div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important; height: 0px"> </div>
</div>
<![endif]>
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->
<!-- BaNnErBlUrFlE-BoDy-end -->
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
#pfptBannerpoircn7 { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important;
background-color: #D0D8DC !important;
max-width: none !important; max-height: none !important }
.pfptPrimaryButtonpoircn7:hover, .pfptPrimaryButtonpoircn7:focus {
background-color: #b4c1c7 !important; }
.pfptPrimaryButtonpoircn7:active {
background-color: #90a4ae !important; }
</style>
<!-- BaNnErBlUrFlE-HeAdEr-end -->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
{font-family:"Segoe UI";
panose-1:2 11 5 2 4 2 4 2 2 3;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"PingFang SC";
panose-1:2 11 4 0 0 0 0 0 0 0;}
@font-face
{font-family:"\@PingFang SC";}
@font-face
{font-family:"Helvetica Neue";
panose-1:2 0 5 3 0 0 0 2 0 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:10.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.apple-converted-space
{mso-style-name:apple-converted-space;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="en-CN" link="blue" vlink="purple" style="word-wrap:break-word;line-break:after-white-space">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi Barry,<br>
<br>
Thank you for the information provided!<br>
<br>
Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I am now using OpenBLAS but didn’t see much improvement when theb multithreading are enabled, do you think other implementation
</span><span lang="EN-US" style="font-size:11.0pt">such as netlib and intel-mkl will help?<br>
<br>
Best,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt">Yongzhong<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Barry Smith <bsmith@petsc.dev><br>
<b>Date: </b>Monday, April 22, 2024 at 4:20</span><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"> </span><span style="font-size:12.0pt;color:black">PM<br>
<b>To: </b>Yongzhong Li <yongzhong.li@mail.utoronto.ca><br>
<b>Cc: </b>petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>, petsc-maint@mcs.anl.gov <petsc-maint@mcs.anl.gov>, Piero Triverio <piero.triverio@utoronto.ca><br>
<b>Subject: </b>Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver<o:p></o:p></span></p>
</div>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" align="left" width="100%" style="width:100.0%;display:table;border-collapse:seperate;float:none">
<tbody>
<tr>
<td style="background:#A6A6A6;padding:5.25pt 1.5pt 5.25pt 1.5pt"></td>
<td width="100%" style="width:100.0%;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 11.25pt">
<div>
<p class="MsoNormal" style="mso-element:frame;mso-element-frame-hspace:2.25pt;mso-element-wrap:around;mso-element-anchor-vertical:paragraph;mso-element-anchor-horizontal:column;mso-height-rule:exactly">
<span lang="ZH-CN" style="font-size:9.0pt;font-family:"PingFang SC",sans-serif;color:#212121">你通常不会收到来自</span><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121"> bsmith@petsc.dev
</span><span lang="ZH-CN" style="font-size:9.0pt;font-family:"PingFang SC",sans-serif;color:#212121">的电子邮件。</span><span style="color:black"><a href="https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!aNwwhaNwa_gbiduyqOK-fh3XqaflSgzf_Epel-lCvCrQrOx_5whj-_fjlBOTZsR-8DKl0ZHsC7L78nIw1YvDUq9dQO1D0UttjKo$"><span lang="ZH-CN" style="font-size:9.0pt;font-family:"PingFang SC",sans-serif">了解这一点为什么很重要</span></a></span><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121"><o:p></o:p></span></p>
</div>
</td>
<td width="75" style="width:56.25pt;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 3.75pt">
</td>
</tr>
</tbody>
</table>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:12.0pt"> PETSc provided solvers do not directly use threads.
<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"> The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"> Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly
for the solver. <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"> Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially
with PCLU. <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"> If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads <n> to set the number of threads, equivalent to setting the environmental
variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt"><br>
<br>
<o:p></o:p></span></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span style="font-size:12.0pt">On Apr 22, 2024, at 4:06</span><span style="font-size:12.0pt;font-family:"Arial",sans-serif"> </span><span style="font-size:12.0pt">PM, Yongzhong Li <yongzhong.li@mail.utoronto.ca> wrote:<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
<div>
<div id="pfptBannerbeaibqa">
<div id="pfptBannerbeaibqa">
<div id="pfptBannerbeaibqa">
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif">This Message Is From an External Sender<o:p></o:p></span></p>
</div>
<div id="pfptBannerbeaibqa">
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Arial",sans-serif">This message came from outside your organization.<o:p></o:p></span></p>
</div>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">Hello all,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""> <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">I am writing to ask if PETSc’s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""> <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively.
I observed that threads were being utilized, controlled by the<span class="apple-converted-space"> </span><b>OPENBLAS_NUM_THREADS</b><span class="apple-converted-space"> </span>environment variable. However, I noticed no significant performance difference
between running the solver with multiple threads versus a single thread.<br>
<br>
Could you please<span class="apple-converted-space"> </span><b>confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK?</b><span class="apple-converted-space"> </span><b>If
so</b>, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES?<span class="apple-converted-space"> </span><b>If not</b>, why do I observe that threads are being used during the GMERS solutions?<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""> <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><b><span style="font-family:"Helvetica Neue"">For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows:</span></b><span style="font-family:"Helvetica Neue""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""><br>
./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade
--with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp<br>
<br>
To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code
is as follows:<br>
<br>
#include <petscksp.h><br>
<br>
/* Monitor function to print iteration number and residual norm */<br>
PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) {<br>
<span class="apple-converted-space"> </span>PetscErrorCode ierr;<br>
<span class="apple-converted-space"> </span>ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm);<br>
<span class="apple-converted-space"> </span>CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>return 0;<br>
}<br>
<br>
int main(int argc, char **args) {<br>
<span class="apple-converted-space"> </span>Vec x, b, x_true, e;<br>
<span class="apple-converted-space"> </span>Mat A;<br>
<span class="apple-converted-space"> </span>KSP ksp;<br>
<span class="apple-converted-space"> </span>PetscErrorCode ierr;<br>
<span class="apple-converted-space"> </span>PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n<br>
<span class="apple-converted-space"> </span>PetscInt Istart, Iend, ncols;<br>
<span class="apple-converted-space"> </span>PetscScalar v;<br>
<span class="apple-converted-space"> </span>PetscMPIInt rank;<br>
<span class="apple-converted-space"> </span>PetscInitialize(&argc, &args, NULL, NULL);<br>
<span class="apple-converted-space"> </span>PetscLogDouble t1, t2;<span class="apple-converted-space"> </span>// Variables for timing<br>
<span class="apple-converted-space"> </span>MPI_Comm_rank(PETSC_COMM_WORLD, &rank);<br>
<br>
<span class="apple-converted-space"> </span>// Create vectors and matrix<br>
<span class="apple-converted-space"> </span>ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDuplicate(x, &b); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Set true solution as all ones<br>
<span class="apple-converted-space"> </span>ierr = VecSet(x_true, 1.0); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Create and assemble matrix A for the 2D Laplacian using 5-point stencil<br>
<span class="apple-converted-space"> </span>ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatSetFromOptions(A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatSetUp(A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>for (Ii = Istart; Ii < Iend; Ii++) {<br>
<span class="apple-converted-space"> </span>i = Ii / n; // Row index<br>
<span class="apple-converted-space"> </span>j = Ii % n; // Column index<br>
<span class="apple-converted-space"> </span>v = -4.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>if (i > 0) { // South<br>
<span class="apple-converted-space"> </span>J = Ii - n;<br>
<span class="apple-converted-space"> </span>v = 1.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>if (i < n - 1) { // North<br>
<span class="apple-converted-space"> </span>J = Ii + n;<br>
<span class="apple-converted-space"> </span>v = 1.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>if (j > 0) { // West<br>
<span class="apple-converted-space"> </span>J = Ii - 1;<br>
<span class="apple-converted-space"> </span>v = 1.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>if (j < n - 1) { // East<br>
<span class="apple-converted-space"> </span>J = Ii + 1;<br>
<span class="apple-converted-space"> </span>v = 1.0;<br>
<span class="apple-converted-space"> </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>}<br>
<span class="apple-converted-space"> </span>ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Compute the RHS corresponding to the true solution<br>
<span class="apple-converted-space"> </span>ierr = MatMult(A, x_true, b); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Set up and solve the linear system<br>
<span class="apple-converted-space"> </span>ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>/* Set up the monitor */<br>
<span class="apple-converted-space"> </span>ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Start timing<br>
<span class="apple-converted-space"> </span>PetscTime(&t1);<br>
<br>
<span class="apple-converted-space"> </span>ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr);<br>
<br>
<span class="apple-converted-space"> </span>// Stop timing<br>
<span class="apple-converted-space"> </span>PetscTime(&t2);<br>
<br>
<span class="apple-converted-space"> </span>// Compute error<br>
<span class="apple-converted-space"> </span>ierr = VecDuplicate(x, &e); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>PetscReal norm_error, norm_true;<br>
<span class="apple-converted-space"> </span>ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>PetscReal relative_error = norm_error / norm_true;<br>
<span class="apple-converted-space"> </span>if (rank == 0) { // Print only from the first MPI process<br>
<span class="apple-converted-space"> </span>PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error);<br>
<span class="apple-converted-space"> </span>}<br>
<br>
<span class="apple-converted-space"> </span>// Output the wall time taken for MatMult<br>
<span class="apple-converted-space"> </span>PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1);<br>
<br>
<span class="apple-converted-space"> </span>// Cleanup<br>
<span class="apple-converted-space"> </span>ierr = VecDestroy(&x); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDestroy(&b); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDestroy(&x_true); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = VecDestroy(&e); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = MatDestroy(&A); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>ierr = KSPDestroy(&ksp); CHKERRQ(ierr);<br>
<span class="apple-converted-space"> </span>PetscFinalize();<br>
<span class="apple-converted-space"> </span>return 0;<br>
}<br>
<br>
Here are some profiling results for GMERS solution.<br>
<br>
OPENBLAS_NUM_THREADS = 1, iteration steps<span class="apple-converted-space"> </span>= 859, solution time = 16.1<br>
OPENBLAS_NUM_THREADS = 2, iteration steps<span class="apple-converted-space"> </span>= 859, solution time = 16.3<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">OPENBLAS_NUM_THREADS = 4, iteration steps<span class="apple-converted-space"> </span>= 859, solution time = 16.7<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">OPENBLAS_NUM_THREADS = 8, iteration steps<span class="apple-converted-space"> </span>= 859, solution time = 16.8<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">OPENBLAS_NUM_THREADS = 16, iteration steps<span class="apple-converted-space"> </span>= 859, solution time = 17.8<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""><br>
<b>I am using one workstation with Intel® Core™ i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong.</b><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""> <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">Thank you in advance!<br>
<br>
Sincerely,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">Yongzhong<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><span style="font-size:12.0pt"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">-----------------------------------------------------------</span><span style="font-size:12.0pt"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yongzhong Li</span></b><span style="font-size:12.0pt"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">PhD student | Electromagnetics Group</span><span style="font-size:12.0pt"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">Department of Electrical & Computer Engineering</span><span style="font-size:12.0pt"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">University of Toronto</span><span style="font-size:12.0pt"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><a href="https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!efVv_hPkRBEhyquXer2c8sFeGrjOtTjEGicYg2niCyfT9swzjLFyf6k4XrhKElaF-cX_Q02y9KSTRNFHPlKhXMtuzaTekCWcXgw$"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#0563C1">http://www.modelics.org</span></a><span style="font-size:12.0pt"><o:p></o:p></span></p>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><span style="font-size:12.0pt"><o:p> </o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</body>
</html>