<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
  #pfptBanner52c8e17 { all: revert !important; display: block !important; 
    visibility: visible !important; opacity: 1 !important; 
    background-color: #D0D8DC !important; 
    max-width: none !important; max-height: none !important }
  .pfptPrimaryButton52c8e17:hover, .pfptPrimaryButton52c8e17:focus {
    background-color: #b4c1c7 !important; }
  .pfptPrimaryButton52c8e17:active {
    background-color: #90a4ae !important; }
</style>

<!-- BaNnErBlUrFlE-HeAdEr-end -->
</head><!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">
 On Tue, Apr 23, 2024 at 4: 00 PM Yongzhong Li <yongzhong. li@ mail. utoronto. ca> wrote: Thanks Barry! Does this mean that the sparse matrix-vector products, which actually constitute the majority of the computations in my GMRES routine in
</div>
<!-- Preheader Text : END -->

<!-- Email Banner : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>

<!--[if ((ie)|(mso))]>
  <table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" ><tr><td>
    <table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #90a4ae;background-color:#D0D8DC;"><tr><td valign="top">
      <table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
        <tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
          This Message Is From an External Sender
        </td></tr>
        <tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
          This message came from outside your organization.
        </td></tr>

      </table>

    </td></tr></table>
  </td></tr></table>
<![endif]-->

<![if !((ie)|(mso))]>
  <div dir="ltr"  id="pfptBanner52c8e17" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #D0D8DC !important; background-color: #D0D8DC; border-top: 4px solid #90a4ae !important; border-top: 4px solid #90a4ae;">
    <div id="pfptBanner52c8e17" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
      <div id="pfptBanner52c8e17" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-weight:bold !important; font-weight:bold; font-size:14px !important; line-height:18px !important; line-height:18px">
        This Message Is From an External Sender
      </div>
      <div id="pfptBanner52c8e17" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-weight:normal; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-size:12px !important; line-height:18px !important; line-height:18px; margin-top:2px !important;">
This message came from outside your organization.
      </div>

    </div>

    <div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important; height: 0px"> </div>
  </div>
<![endif]>

<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->

<!-- BaNnErBlUrFlE-BoDy-end -->
<div dir="ltr"><div dir="ltr">On Tue, Apr 23, 2024 at 4:00 PM Yongzhong Li <<a href="mailto:yongzhong.li@mail.utoronto.ca">yongzhong.li@mail.utoronto.ca</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-408956260583332202">

<div style="font-size:1px;color:rgb(255,255,255);line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;display:none">
 Thanks Barry! Does this mean that the sparse matrix-vector products, which actually constitute the majority of the computations in my GMRES routine in PETSc, don’t utilize multithreading? Only basic vector operations such as VecAXPY and VecDot
</div>



<div style="font-size:1px;color:rgb(255,255,255);line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;display:none">ZjQcmQRYFpfptBannerStart</div>



<u></u>
  <div dir="ltr" id="m_-408956260583332202pfptBanner5d9re21" style="display:block;text-align:left;margin:16px 0px;padding:8px 16px;border-radius:4px;min-width:200px;background-color:rgb(208,216,220);border-top:4px solid rgb(144,164,174)">
    <div id="m_-408956260583332202pfptBanner5d9re21" style="float:left;display:block;margin:0px 0px 1px;max-width:600px">
      <div id="m_-408956260583332202pfptBanner5d9re21" style="display:block;background-color:rgb(208,216,220);color:rgb(0,0,0);font-family:Arial,sans-serif;font-weight:bold;font-size:14px;line-height:18px">
        This Message Is From an External Sender
      </div>
      <div id="m_-408956260583332202pfptBanner5d9re21" style="font-weight:normal;display:block;background-color:rgb(208,216,220);color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:18px;margin-top:2px">
This message came from outside your organization.
      </div>

    </div>

    <div style="height:0px;clear:both;display:block;line-height:0;font-size:0.01px"> </div>
  </div>
<u></u>

<div style="font-size:1px;color:rgb(255,255,255);line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;display:none">ZjQcmQRYFpfptBannerEnd</div>













<div lang="en-CN" style="overflow-wrap: break-word;">
<div class="m_-408956260583332202WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt">Thanks Barry! Does this mean that the sparse matrix-vector products, which actually constitute the majority of the computations in my GMRES routine in PETSc, don’t utilize multithreading? Only
 basic vector operations such as VecAXPY and VecDot or dense matrix operations in PETSc will benefit from multithreading, is it correct?<br></span></p></div></div></div></blockquote><div><br></div><div>I am not sure what your point is above.</div><div><br></div><div>SpMV performance is mainly controlled by memory bandwidth (latency plays very little role). If you think</div><div>the MPI processes are not using the full bandwidth, use more processes. If they are, and you think using</div><div>threads will speed anything up, you are incorrect. In fact, the only difference between a thread and a</div><div>process is the default memory sharing flag, MPI will perform at least as well (and usually better), than</div><div>adding threads to SpMV.</div><div><br></div><div>There are dozens of publications showing this.</div><div><br></div><div>  Thanks,</div><div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-408956260583332202"><div lang="en-CN" style="overflow-wrap: break-word;"><div class="m_-408956260583332202WordSection1"><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt">
Best,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt">Yongzhong<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt"><u></u> <u></u></span></p>
<div id="m_-408956260583332202mail-editor-reference-message-container">
<div>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(181,196,223);padding:3pt 0cm 0cm">
<p class="MsoNormal" style="margin-bottom:12pt"><b><span style="font-size:12pt;color:black">From:
</span></b><span style="font-size:12pt;color:black">Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>><br>
<b>Date: </b>Tuesday, April 23, 2024 at 3:35</span><span style="font-size:12pt;font-family:Arial,sans-serif;color:black"> </span><span style="font-size:12pt;color:black">PM<br>
<b>To: </b>Yongzhong Li <<a href="mailto:yongzhong.li@mail.utoronto.ca" target="_blank">yongzhong.li@mail.utoronto.ca</a>><br>
<b>Cc: </b><a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>, <a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a> <<a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a>>, Piero Triverio <<a href="mailto:piero.triverio@utoronto.ca" target="_blank">piero.triverio@utoronto.ca</a>><br>
<b>Subject: </b>Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver<u></u><u></u></span></p>
</div>
<table border="0" cellspacing="0" cellpadding="0" align="left" width="100%" style="width:100%;display:table;float:none">
<tbody>
<tr>
<td style="background:rgb(166,166,166);padding:5.25pt 1.5pt"></td>
<td width="100%" style="width:100%;background:rgb(234,234,234);padding:5.25pt 3.75pt 5.25pt 11.25pt">
<div>
<p class="MsoNormal">
<span lang="ZH-CN" style="font-size:9pt;font-family:"PingFang SC",sans-serif;color:rgb(33,33,33)">你通常不会收到来自</span><span style="font-size:9pt;font-family:"Segoe UI",sans-serif;color:rgb(33,33,33)"> <a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>
</span><span lang="ZH-CN" style="font-size:9pt;font-family:"PingFang SC",sans-serif;color:rgb(33,33,33)">的电子邮件。</span><span style="font-size:9pt;font-family:"Segoe UI",sans-serif;color:rgb(33,33,33)"><a href="https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!dNTjUVcAAP0C7NtR5H6sE0meEnl7wTwB9pzM-8m4NDdThhFir6g2N9NoawLr_s-JIN9Vgg8_Wy6a1-23415HryX9RWYd7b5-_Cc$" target="_blank"><span lang="ZH-CN" style="font-family:"PingFang SC",sans-serif">了解这一点为什么很重要</span></a><u></u><u></u></span></p>
</div>
</td>
<td width="75" style="width:56.25pt;background:rgb(234,234,234);padding:5.25pt 3.75pt">
</td>
</tr>
</tbody>
</table>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt"><u></u> <u></u></span></p>
</div>
<p class="MsoNormal"><span style="font-size:12pt">   Intel MKL or OpenBLAS are the best bet, but for vector operations they will not be significant since the vector operations do not dominate the computations.<u></u><u></u></span></p>
<div>
<p class="MsoNormal"><span style="font-size:12pt"><br>
<br>
<u></u><u></u></span></p>
<blockquote style="margin-top:5pt;margin-bottom:5pt">
<div>
<p class="MsoNormal"><span style="font-size:12pt">On Apr 23, 2024, at 3:23</span><span style="font-size:12pt;font-family:Arial,sans-serif"> </span><span style="font-size:12pt">PM, Yongzhong Li <<a href="mailto:yongzhong.li@mail.utoronto.ca" target="_blank">yongzhong.li@mail.utoronto.ca</a>> wrote:<u></u><u></u></span></p>
</div>
<p class="MsoNormal"><span style="font-size:12pt"><u></u> <u></u></span></p>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11pt">Hi Barry,<br>
<br>
Thank you for the information provided!<br>
<br>
Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc?</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"> </span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt">I am now using OpenBLAS but didn’t see much improvement when theb multithreading are enabled, do you think other implementation<span class="m_-408956260583332202apple-converted-space"> </span></span><span lang="EN-US" style="font-size:11pt">such
 as netlib and intel-mkl will help?<br>
<br>
Best,</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt">Yongzhong</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"> </span><u></u><u></u></p>
</div>
<div id="m_-408956260583332202mail-editor-reference-message-container">
<div>
<div style="border-right:none currentcolor;border-bottom:none currentcolor;border-left:none currentcolor;border-top:1pt solid currentcolor;padding:3pt 0cm 0cm">
<p class="MsoNormal" style="margin-bottom:12pt"><b><span style="font-size:12pt">From:<span class="m_-408956260583332202apple-converted-space"> </span></span></b><span style="font-size:12pt">Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>><br>
<b>Date:<span class="m_-408956260583332202apple-converted-space"> </span></b>Monday, April 22, 2024 at 4:20</span><span style="font-size:12pt;font-family:Arial,sans-serif"> </span><span style="font-size:12pt">PM<br>
<b>To:<span class="m_-408956260583332202apple-converted-space"> </span></b>Yongzhong Li <<a href="mailto:yongzhong.li@mail.utoronto.ca" target="_blank">yongzhong.li@mail.utoronto.ca</a>><br>
<b>Cc:<span class="m_-408956260583332202apple-converted-space"> </span></b><a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a><span class="m_-408956260583332202apple-converted-space"> </span><<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>,<span class="m_-408956260583332202apple-converted-space"> </span><a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a><span class="m_-408956260583332202apple-converted-space"> </span><<a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a>>,
 Piero Triverio <<a href="mailto:piero.triverio@utoronto.ca" target="_blank">piero.triverio@utoronto.ca</a>><br>
<b>Subject:<span class="m_-408956260583332202apple-converted-space"> </span></b>Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver</span><u></u><u></u></p>
</div>
<table border="0" cellspacing="0" cellpadding="0" align="left" width="100%" style="width:100%;display:table;float:none">
<tbody>
<tr>
<td style="background:rgb(166,166,166);padding:5.25pt 1.5pt"></td>
<td width="100%" style="width:100%;background:rgb(234,234,234);padding:5.25pt 3.75pt 5.25pt 11.25pt">
<div>
<div>
<p class="MsoNormal">
<span lang="ZH-CN" style="font-size:9pt;font-family:"PingFang SC",sans-serif;color:rgb(33,33,33)">你通常不会收到来自</span><span class="m_-408956260583332202apple-converted-space"><span style="font-size:9pt;font-family:"Segoe UI",sans-serif;color:rgb(33,33,33)"> </span></span><span style="font-size:9pt;font-family:"Segoe UI",sans-serif;color:rgb(33,33,33)"><a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a><span class="m_-408956260583332202apple-converted-space"> </span></span><span lang="ZH-CN" style="font-size:9pt;font-family:"PingFang SC",sans-serif;color:rgb(33,33,33)">的电子邮件。</span><span style="color:black"><a href="https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!dNTjUVcAAP0C7NtR5H6sE0meEnl7wTwB9pzM-8m4NDdThhFir6g2N9NoawLr_s-JIN9Vgg8_Wy6a1-23415HryX9RWYd7b5-_Cc$" target="_blank"><span lang="ZH-CN" style="font-size:9pt;font-family:"PingFang SC",sans-serif">了解这一点为什么很重要</span></a></span><u></u><u></u></p>
</div>
</div>
</td>
<td width="75" style="width:56.25pt;background:rgb(234,234,234);padding:5.25pt 3.75pt">
</td>
</tr>
</tbody>
</table>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt"> </span><u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12pt">   PETSc provided solvers do not directly use threads. </span><u></u><u></u></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt"> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt">   The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt"> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt">   Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly
 for the solver. </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt"> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt">   Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially
 with PCLU. </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt"> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt">   If you run with -blas_view  PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads <n> to set the number of threads, equivalent to setting the environmental
 variable.  For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect.</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt"> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt"> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12pt"><u></u> <u></u></p>
</div>
<blockquote style="margin-top:5pt;margin-bottom:5pt">
<div>
<div>
<p class="MsoNormal"><span style="font-size:12pt">On Apr 22, 2024, at 4:06</span><span style="font-size:12pt;font-family:Arial,sans-serif"> </span><span style="font-size:12pt">PM, Yongzhong Li <<a href="mailto:yongzhong.li@mail.utoronto.ca" target="_blank">yongzhong.li@mail.utoronto.ca</a>>
 wrote:</span><u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12pt"> </span><u></u><u></u></p>
</div>
<div>
<div id="m_-408956260583332202pfptBannerbeaibqa">
<div id="m_-408956260583332202pfptBannerbeaibqa">
<div id="m_-408956260583332202pfptBannerbeaibqa">
<div>
<p class="MsoNormal"><span style="font-size:12pt;font-family:Arial,sans-serif">This Message Is From an External Sender</span><u></u><u></u></p>
</div>
</div>
<div id="m_-408956260583332202pfptBannerbeaibqa">
<div>
<p class="MsoNormal"><span style="font-size:12pt;font-family:Arial,sans-serif">This message came from outside your organization.</span><u></u><u></u></p>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">Hello all,</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">I am writing to ask if PETSc’s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm.</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively.
 I observed that threads were being utilized, controlled by the<span class="m_-408956260583332202apple-converted-space"> </span><b>OPENBLAS_NUM_THREADS</b><span class="m_-408956260583332202apple-converted-space"> </span>environment variable. However, I noticed no significant performance difference
 between running the solver with multiple threads versus a single thread.<br>
<br>
Could you please<span class="m_-408956260583332202apple-converted-space"> </span><b>confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK?</b><span class="m_-408956260583332202apple-converted-space"> </span><b>If
 so</b>, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES?<span class="m_-408956260583332202apple-converted-space"> </span><b>If not</b>, why do I observe that threads are being used during the GMERS solutions?</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><b><span style="font-family:"Helvetica Neue"">For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows:</span></b><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""><br>
./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade
 --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp<br>
<br>
To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code
 is as follows:<br>
<br>
#include <petscksp.h><br>
<br>
/* Monitor function to print iteration number and residual norm */<br>
PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) {<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscErrorCode ierr;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>return 0;<br>
}<br>
<br>
int main(int argc, char **args) {<br>
<span class="m_-408956260583332202apple-converted-space">    </span>Vec x, b, x_true, e;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>Mat A;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>KSP ksp;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscErrorCode ierr;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscInt Istart, Iend, ncols;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscScalar v;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscMPIInt rank;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscInitialize(&argc, &args, NULL, NULL);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscLogDouble t1, t2;<span class="m_-408956260583332202apple-converted-space">     </span>// Variables for timing<br>
<span class="m_-408956260583332202apple-converted-space">    </span>MPI_Comm_rank(PETSC_COMM_WORLD, &rank);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Create vectors and matrix<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecDuplicate(x, &b); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Set true solution as all ones<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecSet(x_true, 1.0); CHKERRQ(ierr);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Create and assemble matrix A for the 2D Laplacian using 5-point stencil<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = MatSetFromOptions(A); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = MatSetUp(A); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>for (Ii = Istart; Ii < Iend; Ii++) {<br>
<span class="m_-408956260583332202apple-converted-space">        </span>i = Ii / n; // Row index<br>
<span class="m_-408956260583332202apple-converted-space">        </span>j = Ii % n; // Column index<br>
<span class="m_-408956260583332202apple-converted-space">        </span>v = -4.0;<br>
<span class="m_-408956260583332202apple-converted-space">        </span>ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">        </span>if (i > 0) { // South<br>
<span class="m_-408956260583332202apple-converted-space">            </span>J = Ii - n;<br>
<span class="m_-408956260583332202apple-converted-space">            </span>v = 1.0;<br>
<span class="m_-408956260583332202apple-converted-space">            </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">        </span>}<br>
<span class="m_-408956260583332202apple-converted-space">        </span>if (i < n - 1) { // North<br>
<span class="m_-408956260583332202apple-converted-space">            </span>J = Ii + n;<br>
<span class="m_-408956260583332202apple-converted-space">            </span>v = 1.0;<br>
<span class="m_-408956260583332202apple-converted-space">            </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">        </span>}<br>
<span class="m_-408956260583332202apple-converted-space">        </span>if (j > 0) { // West<br>
<span class="m_-408956260583332202apple-converted-space">            </span>J = Ii - 1;<br>
<span class="m_-408956260583332202apple-converted-space">            </span>v = 1.0;<br>
<span class="m_-408956260583332202apple-converted-space">            </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">        </span>}<br>
<span class="m_-408956260583332202apple-converted-space">        </span>if (j < n - 1) { // East<br>
<span class="m_-408956260583332202apple-converted-space">            </span>J = Ii + 1;<br>
<span class="m_-408956260583332202apple-converted-space">            </span>v = 1.0;<br>
<span class="m_-408956260583332202apple-converted-space">            </span>ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">        </span>}<br>
<span class="m_-408956260583332202apple-converted-space">    </span>}<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Compute the RHS corresponding to the true solution<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = MatMult(A, x_true, b); CHKERRQ(ierr);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Set up and solve the linear system<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>/* Set up the monitor */<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Start timing<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscTime(&t1);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Stop timing<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscTime(&t2);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Compute error<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecDuplicate(x, &e); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscReal norm_error, norm_true;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscReal relative_error = norm_error / norm_true;<br>
<span class="m_-408956260583332202apple-converted-space">    </span>if (rank == 0) { // Print only from the first MPI process<br>
<span class="m_-408956260583332202apple-converted-space">        </span>PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>}<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Output the wall time taken for MatMult<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1);<br>
<br>
<span class="m_-408956260583332202apple-converted-space">    </span>// Cleanup<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecDestroy(&x); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecDestroy(&b); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecDestroy(&x_true); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = VecDestroy(&e); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = MatDestroy(&A); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>ierr = KSPDestroy(&ksp); CHKERRQ(ierr);<br>
<span class="m_-408956260583332202apple-converted-space">    </span>PetscFinalize();<br>
<span class="m_-408956260583332202apple-converted-space">    </span>return 0;<br>
}<br>
<br>
Here are some profiling results for GMERS solution.<br>
<br>
OPENBLAS_NUM_THREADS = 1, iteration steps<span class="m_-408956260583332202apple-converted-space">  </span>= 859, solution time = 16.1<br>
OPENBLAS_NUM_THREADS = 2, iteration steps<span class="m_-408956260583332202apple-converted-space">  </span>= 859, solution time = 16.3</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">OPENBLAS_NUM_THREADS = 4, iteration steps<span class="m_-408956260583332202apple-converted-space">  </span>= 859, solution time = 16.7</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">OPENBLAS_NUM_THREADS = 8, iteration steps<span class="m_-408956260583332202apple-converted-space">  </span>= 859, solution time = 16.8</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">OPENBLAS_NUM_THREADS = 16, iteration steps<span class="m_-408956260583332202apple-converted-space">  </span>= 859, solution time = 17.8</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""><br>
<b>I am using one workstation with Intel® Core™ i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong.</b></span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue""> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">Thank you in advance!<br>
<br>
Sincerely,</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Helvetica Neue"">Yongzhong</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11pt"> </span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">-----------------------------------------------------------</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">Yongzhong Li</span></b><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">PhD student | Electromagnetics Group</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">Department of Electrical & Computer Engineering</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">University of Toronto</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><a href="https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!efVv_hPkRBEhyquXer2c8sFeGrjOtTjEGicYg2niCyfT9swzjLFyf6k4XrhKElaF-cX_Q02y9KSTRNFHPlKhXMtuzaTekCWcXgw$" target="_blank"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(5,99,193)">http://www.modelics.org</span></a><u></u><u></u></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><span style="font-size:12pt"><u></u> <u></u></span></p>
</div>
</div>
</div>
</div>
</div>

</div></blockquote></div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d3DJ62gIDNdeGxB_NtVYXcyjdkvqiZl_TpDpRGWE0xdSPrsCxJN0xatFarNh9uELGFK33NBr1tWMxXox4mE6$" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>