<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Helvetica;
panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.apple-converted-space
{mso-style-name:apple-converted-space;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
p.msipfooter6d2e06ff, li.msipfooter6d2e06ff, div.msipfooter6d2e06ff
{mso-style-name:msipfooter6d2e06ff;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1136265670;
mso-list-template-ids:623041764;}
@list l1
{mso-list-id:1147236546;
mso-list-template-ids:785545084;}
@list l1:level1
{mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level2
{mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level3
{mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level4
{mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level5
{mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level6
{mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level7
{mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level8
{mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level9
{mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Thank you, Barry. I will dig more on the issue with your suggestions.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="msipfooter6d2e06ff" align="center" style="margin:0in;text-align:center">
<span style="font-size:10.0pt;color:black">Schlumberger-Private</span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Barry Smith <bsmith@petsc.dev> <br>
<b>Sent:</b> Friday, May 20, 2022 12:33 PM<br>
<b>To:</b> Ernesto Prudencio <EPrudencio@slb.com><br>
<b>Cc:</b> PETSc users list <petsc-users@mcs.anl.gov><br>
<b>Subject:</b> [Ext] Re: [petsc-users] Very slow VecDot operations<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal"> Ernesto,<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"> If you ran (or can run) with -log_view you could see the time "ratio" in the output that tells how much time the "fastest" rank spent on the dot product versus the "slowest". Based on the different counts per rank you report that ratio
might be around 3. But based on the times you report is around 200! <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"> My guess is that for the VecDotRhs() some ranks are arriving at the vec dot long before other ranks and have to wait there an extremely long amount of time making it appear that the dot product is very slow. While, in reality, the large
time credited to the vecdot is due to a misbalance in time for the operation before the VecDot.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"> Barry<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><o:p> </o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">On May 20, 2022, at 1:23 PM, Ernesto Prudencio via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>> wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">I am using LSQR to minimize || L x – b ||_2, where L is a sparse rectangular matrix with 145,253,395 rows, 209,423,775 columns, and around 54 billion non zeros.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">The numbers reported below are for a run with 27 compute nodes, each compute node with 4 MPI ranks, so a total of 108 ranks.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Throughout the run, I assess the runtime taken by all dot products during the LSQR iterations, and I differentiate between dot products involving vectors of the size of the solution vector “x”, and dot products involving vectors of the
size of the rhs “b”. Here are the numbers I get (we have an implementation of LSQR that performs some extra vector dot products for our needs):<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">236 VecDotSol take 1.523 seconds<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">226 VecDotRhs take 326.008 seconds<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Regarding the partition of rows and columns among the 108 MPI ranks:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Rows: min = 838,529 ; avg = 1.34494e+06 ; max = 2,437,206<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Columns: min = 1,903,500 ; avg = 1.93911e+06 ; max = 1,946,270<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Regarding the partition of rows and columns among the 27 compute nodes:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Rows: min = 3,575,584 ; avg = 5.37976e+06 ; max = 8,788,062<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Columns: min = 7,637,500 ; avg = 7.75644e+06 ; max = 7,785,080<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Questions:<o:p></o:p></p>
</div>
<ol style="margin-top:0in" start="1" type="1">
<li class="MsoListParagraph" style="margin-top:0in;margin-bottom:0in;mso-list:l1 level1 lfo3">
Why the average run times are so different between VecDotSol and VecDotRhs?<o:p></o:p></li><li class="MsoListParagraph" style="margin-top:0in;margin-bottom:0in;mso-list:l1 level1 lfo3">
Could the much bigger unbalancing among the number of rows per rank (compared to the very well balanced distribution of columns per rank) be the cause?<o:p></o:p></li><li class="MsoListParagraph" style="margin-top:0in;margin-bottom:0in;mso-list:l1 level1 lfo3">
Have you ever observed such situation?<o:p></o:p></li><li class="MsoListParagraph" style="margin-top:0in;margin-bottom:0in;mso-list:l1 level1 lfo3">
Could it be because of a bad MPI configuration / parametrization with respect to the underlying network?<o:p></o:p></li><li class="MsoListParagraph" style="margin-top:0in;margin-bottom:0in;mso-list:l1 level1 lfo3">
But, if yes, why the VecDotSol dot products are so much faster than VecDotRhs?<o:p></o:p></li></ol>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Thank you in advance,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Ernesto.<span class="apple-converted-space"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<p class="MsoNormal"><span style="font-size:13.5pt;font-family:"Helvetica",sans-serif"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt">Schlumberger-Private</span><span style="font-size:13.5pt;font-family:"Helvetica",sans-serif"><o:p></o:p></span></p>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>