[petsc-dev] Helping users understand ---- Fwd: [petsc-maint] Parallel efficiency

Barry Smith bsmith at mcs.anl.gov
Fri Mar 21 16:35:50 CDT 2014


  This email exemplifies why parallel computing on non-customized systems (i.e. a bunch of workstations or servers randomly wired together) can never be mainstream and will always be a bang your head against the wall experience for new comers. 

   People need to start with a simple model of whatever they are working with. For parallel computing it is

    "as you add more resources you get an improvement in time until you get diminishing returns (due to parallel overhead, communication, whatever) and at that point adding more resources no longer gives you an improvement in time.” 

    The next step is to add some quantitative aspect to the model, for example 

    "at first, doubling the resources might roughly double performance, but then it tails off to smaller and smaller relative improvements.”

   These were the models I had when I started in parallel computing and unless something was “broken” they "pretty much" held true*.

    Now people have to deal with performance results like in the attached and understandably they become frustrated and aggravated. 

     And it is totally our communities fault!  There are two related problems

1) The concept of “resource” that the user sees, i.e. the NP in mpiexec -n NP is a pretty terrible measure of resource; too often doubling NP means doubling one less important resource and possibly not changing the important resource (memory bandwidth) at all. 

2) Defaults for where each additional process is run in mpiexec are often terrible so that important resources do not increase as rapidly as possible with increasing NP.

   What can we do to decrease the number of such inquiries by satisfying users needs better (not just saying RTFFAQ)?

    I’ve pushed to next a new "make streams" to try to help users get some feeling for their system but it is really limited. Are there any tools out there that analyze a parallel system and summarize its properties that we could utilize? 


  Barry



* That is one could certainly cook up circumstances where they were not true but when starting out they were true in the vast majority of times.

Begin forwarded message:

> Subject: Re: [petsc-maint] Parallel efficiency
> Date: March 18, 2014 at 2:16:28 PM CDT
> Cc: Matthew Knepley <petsc-maint at mcs.anl.gov>
> 
> Hi,
> I am sending the output files from log_summary for 4,8,16,32 processors, the number of DOFs used are 198000, so this is the dimension of the matrix. It seems that there is a problem in KSPSolve, but since this is only one command, I don't know what's going on inside. (I also tried to meassure the time using MPI_Wtime() just before and after the KSPSolve and came to the same conclusion) . For some reason KSPsolve time doesn't decrease while increasing the number of processors, although the solution is right.
> If you can think any reason why this is happening, please let me know.
> 
> 
> 
> On Tue, Mar 18, 2014 at 2:56 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    We need the output from -log_summary for several number of processes to see what is scaling, what is not scaling etc.
> 
> 
> 
> 
> > Hello guys,
> > I keep having the same problem, time spent in KSPSolve is high regardless the number of processes used.
> > I set up PETSC using --with-debugging=no, so the debugging option is not delaying my code.
> > I tried to fill in the matrix and the rhs vector using blocks (although I don't face a time problem in MatSetValues) but nothing really changed.
> > Can you imagine any reason why KSPsolve time doesn't decrease drastically while increasing the number of processors?
> > I would be grateful if you had anything to suggest.
> > Thank you in advance for your help,
> > Nick Kyriazis.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140321/97c704ba/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: output4
Type: application/octet-stream
Size: 10343 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140321/97c704ba/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140321/97c704ba/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: output8
Type: application/octet-stream
Size: 10343 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140321/97c704ba/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140321/97c704ba/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: output16
Type: application/octet-stream
Size: 10356 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140321/97c704ba/attachment-0002.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140321/97c704ba/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: output32
Type: application/octet-stream
Size: 10345 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140321/97c704ba/attachment-0003.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140321/97c704ba/attachment-0004.html>


More information about the petsc-dev mailing list