<div dir="ltr"><div dir="ltr">On Wed, Mar 25, 2020 at 2:11 PM Amin Sadeghi <<a href="mailto:aminthefresh@gmail.com">aminthefresh@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">Thank you Matt and Mark for the explanation. That makes sense. Please correct me if I'm wrong, I think instead of asking for the whole node with 32 cores, if I ask for more nodes, say 4 or 8, but each with 8 cores, then I should see much better speedups. Is that correct?</div></div></blockquote><div><br></div><div>Yes, exactly</div><div><br></div><div>  Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 25, 2020 at 2:04 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I would guess that you are saturating the memory bandwidth. After you make PETSc (make all) it will suggest that you test it (make test) and suggest that you run streams (make streams).<div><br></div><div>I see Matt answered but let me add that when you make streams you will seed the memory rate for 1,2,3, ... NP processes. If your machine is decent you should see very good speed up at the beginning and then it will start to saturate. You are seeing about 50% of perfect speedup at 16 process. I would expect that you will see something similar with streams. Without knowing your machine, your results look typical.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 25, 2020 at 1:05 PM Amin Sadeghi <<a href="mailto:aminthefresh@gmail.com" target="_blank">aminthefresh@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">Hi,</div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)"><br></div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">I ran KSP example 45 on a single node with 32 cores and 125GB memory using 1, 16 and 32 MPI processes. Here's a comparison of the time spent during KSP.solve:</div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)"><br></div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">- 1 MPI process: ~98 sec, speedup: 1X</div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">- 16 MPI processes: ~12 sec, speedup: ~8X</div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">- 32 MPI processes: ~11 sec, speedup: ~9X<br></div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)"><br></div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">Since the problem size is large enough (8M unknowns), I expected a speedup much closer to 32X, rather than 9X. Is this expected? If yes, how can it be improved?</div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)"><br></div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">I've attached three log files for more details. </div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)"><br></div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">Sincerely,</div><div style="font-family:tahoma,sans-serif;font-size:small;color:rgb(0,0,0)">Amin</div></div>

</blockquote></div>

</blockquote></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>