<div dir="ltr">Yes you're right. I forgot to mention that the OpenMP schedule used by the multi-threaded system was a static schedule as well (#pragma omp parallel schedule(static))<br><br><div>Rohan</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Dec 14, 2021 at 3:26 PM Victor Eijkhout <<a href="mailto:eijkhout@tacc.utexas.edu">eijkhout@tacc.utexas.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<div style="overflow-wrap: break-word;">

<br>

<div><br>

<blockquote type="cite">

<div>On , 2021Dec11, at 17:56, Rohan Yadav <<a href="mailto:rohany@alumni.cmu.edu" target="_blank">rohany@alumni.cmu.edu</a>> wrote:</div>

<br>

<div><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">40

 mpi ranks on a single node should be similar performance as 40 threads. Both petsc and taco are doing a row-based parallelism strategy so it should line up.</span></div>

</blockquote>

</div>

<br>

<div>An MPI division of rows is static. Petsc divides strictly by numbers of rows.</div>

<div><br>

</div>

<div>A thread based system can do things like “schedule(guided)” (OpenMP) and get better load balancing if the rows have widely differing numbers of nonzero.</div>

<div><br>

</div>

<div>Victor.</div>

<div><br>

</div>

</div>


</blockquote></div>