[petsc-users] speedup for TS solver using DMDA

Tue Sep 16 10:08:20 CDT 2014

thank you! this has been extremely useful in figuring out a plan of action.

On Mon, Sep 15, 2014 at 9:08 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Based on the streams speedups below it looks like a single core can
> utilize roughly 1/2 of the memory bandwidth, leaving all the other cores
> only 1/2 of the bandwidth to utilize, so you can only expect at best a
> speedup of roughly 2 on this machine with traditional PETSc sparse solvers.
>
>   To add insult to injury it appears that the threads are not being
> assigned to physical cores very well either.  Under the best circumstance
> on this system one would like to see a speedup of about 2 when running with
> two processes but it actually delivers only 1.23 and the speedup of 2 only
> occurs with 5 processes. I attribute this to the MPI or OS not assigning
> the second MPI process to the “best” core for memory bandwidth. Likely it
> should assign the second MPI process to the 2nd CPU but instead it is
> assigning it also to the first CPU and only when it gets to the 5th MPI
> process does the second CPU get utilized.
>
>    You can look at the documentation for your MPI’s process affinity to
> see if you can force the 2nd MPI process onto the second CPU.
>
>    Barry
>
>
> np  speedup
> 1 1.0
> 2 1.23
> 3 1.3
> 4 1.75
> 5 2.18
>
>
> 6 1.22
> 7 2.3
> 8 1.22
> 9 2.01
> 10 1.19
> 11 1.93
> 12 1.93
> 13 1.73
> 14 2.17
> 15 1.99
> 16 2.08
> 17 2.16
> 18 1.47
> 19 1.95
> 20 2.09
> 21 1.9
> 22 1.96
> 23 1.92
> 24 2.02
> 25 1.96
> 26 1.89
> 27 1.93
> 28 1.97
> 29 1.96
> 30 1.93
> 31 2.16
> 32 2.12
> Estimation of possible
>
> On Sep 15, 2014, at 1:42 PM, Katy Ghantous <katyghantous at gmail.com> wrote:
>
> > Matt, thanks! i will look into that and find other ways to make the
> computation faster.
> >
> > Barry, the benchmark reports up to 2 speedup, but says 1 node in the
> end. but either way i was expecting a higher speedup.. 2 is the limit for
> two cpus despite the multiple cores?
> >
> > please let me know if the file attached is what you are asking for.
> > Thank you!
> >
> >
> > On Mon, Sep 15, 2014 at 8:23 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >    Please send the output from running
> >
> >     make steams NPMAX=32
> >
> >     in the PETSc root directory.
> >
> >
> >    Barry
> >
> >   My guess is that it reports “one node” is just because it uses the
> “hostname” to distinguish nodes and though your machine has two CPUs, from
> the point of view of the OS it has only a single hostname and hence reports
> just one “node”.
> >
> >
> > On Sep 15, 2014, at 12:45 PM, Katy Ghantous <katyghantous at gmail.com>
> wrote:
> >
> > > Hi,
> > > I am using DMDA to run in parallel TS to solves a set of N equations.
> I am using DMDAGetCorners in the RHSfunction with setting the stencil size
> at 2 to solve a set of coupled ODEs on 30 cores.
> > > The machine has 32 cores (2 physical CPUs with 2x8 core each with
> speed of 3.4Ghz per core).
> > > However, mpiexec with more than one core is showing no speedup.
> > > Also at the configuring/testing stage for petsc on that machine, there
> was no speedup and it only reported one node.
> > > Is there somehting wrong with how i configured petsc or is the
> approach inappropriate for the machine?
> > > I am not sure what files (or sections of the code) you would need to
> be able to answer my question.
> > >
> > > Thank you!
> >
> >
> > <scaling.log>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140916/a195cc71/attachment.html>