<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div>  This email exemplifies why parallel computing on non-customized systems (i.e. a bunch of workstations or servers randomly wired together) can never be mainstream and will always be a bang your head against the wall experience for new comers. </div><div><br></div><div>   People need to start with a simple model of whatever they are working with. For parallel computing it is</div><div><br></div><div>    "as you add more resources you get an improvement in time until you get diminishing returns (due to parallel overhead, communication, whatever) and at that point adding more resources no longer gives you an improvement in time.” </div><div><br></div><div>    The next step is to add some quantitative aspect to the model, for example </div><div><br></div><div>    "at first, doubling the resources might roughly double performance, but then it tails off to smaller and smaller relative improvements.”</div><div><br></div><div>   These were the models I had when I started in parallel computing and unless something was “broken” they "pretty much" held true*.</div><div><br></div><div>    Now people have to deal with performance results like in the attached and understandably they become frustrated and aggravated. </div><div><br></div><div>     And it is totally our communities fault!  There are two related problems</div><div><br></div><div>1) The concept of “resource” that the user sees, i.e. the NP in mpiexec -n NP is a pretty terrible measure of resource; too often doubling NP means doubling one less important resource and possibly not changing the important resource (memory bandwidth) at all. </div><div><br></div><div>2) Defaults for where each additional process is run in mpiexec are often terrible so that important resources do not increase as rapidly as possible with increasing NP.</div><div><br></div><div>   What can we do to decrease the number of such inquiries by satisfying users needs better (not just saying RTFFAQ)?</div><div><br></div><div>    I’ve pushed to next a new "make streams" to try to help users get some feeling for their system but it is really limited. Are there any tools out there that analyze a parallel system and summarize its properties that we could utilize? </div><div><br></div><div><br></div><div>  Barry</div><div><br></div><div><br></div><div><br></div><div>* That is one could certainly cook up circumstances where they were not true but when starting out they were true in the vast majority of times.</div><div><br><div>Begin forwarded message:</div><br><blockquote type="cite"><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;"><span style="font-family:'Helvetica'; color:rgba(0, 0, 0, 1.0);"><b>Subject: </b></span><span style="font-family:'Helvetica';"><b>Re: [petsc-maint] Parallel efficiency</b><br></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;"><span style="font-family:'Helvetica'; color:rgba(0, 0, 0, 1.0);"><b>Date: </b></span><span style="font-family:'Helvetica';">March 18, 2014 at 2:16:28 PM CDT<br></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;"><span style="font-family:'Helvetica'; color:rgba(0, 0, 0, 1.0);"><b>Cc: </b></span><span style="font-family:'Helvetica';">Matthew Knepley <<a href="mailto:petsc-maint@mcs.anl.gov">petsc-maint@mcs.anl.gov</a>><br></span></div><br><div><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><div dir="ltr"><div>Hi,<br>I am sending the output files from log_summary for 4,8,16,32 processors, the number of DOFs used are 198000, so this is the dimension of the matrix. It seems that there is a problem in KSPSolve, but since this is only one command, I don't know what's going on inside. (I also tried to meassure the time using MPI_Wtime() just before and after the KSPSolve and came to the same conclusion) . For some reason KSPsolve time doesn't decrease while increasing the number of processors, although the solution is right.<br>

</div>If you can think any reason why this is happening, please let me know.<br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Mar 18, 2014 at 2:56 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;"><br>

   We need the output from -log_summary for several number of processes to see what is scaling, what is not scaling etc.<br>

<div class="HOEnZb"><div class="h5"><br>

<br><br>

<br>

> Hello guys,<br>

> I keep having the same problem, time spent in KSPSolve is high regardless the number of processes used.<br>

> I set up PETSC using --with-debugging=no, so the debugging option is not delaying my code.<br>

> I tried to fill in the matrix and the rhs vector using blocks (although I don't face a time problem in MatSetValues) but nothing really changed.<br>

> Can you imagine any reason why KSPsolve time doesn't decrease drastically while increasing the number of processors?<br>

> I would be grateful if you had anything to suggest.<br>

> Thank you in advance for your help,<br>

> Nick Kyriazis.<br>

<br>

</div></div></blockquote></div><br></div>

</div></blockquote></div></body></html>