[petsc-dev] examples/benchmarks for weak and strong scaling exercise

Matthew Knepley knepley at gmail.com
Fri Apr 12 15:00:15 CDT 2013


On Fri, Apr 12, 2013 at 9:18 AM, Chris Kees <cekees at gmail.com> wrote:

> I updated the results for the Bratu problem on our SGI. It has 8 cores
> per node (two 4-core processors per node), and I ran from 1 to 256
> cores.  The log_summary output is attached for both studies. Question:
>

Strong scaling:

  This looks fine. You get the classic memory bandwidth starvation after
2 cores on the same node (although your scaling does not completely
bottom out), and among nodes the scaling is great.

Weak scaling:

   I have to go through the logs, but obviously something is wrong. I am
betting it is the failure to increase the GMG levels with increasing problem
size.


> is there anything about the memory usage of that problem that doesn't
> scale? The memory usage looks steady at < 1GB per core based on
> log_summary.  I ask because last night I tried to do one more level of
> refinement for weak scaling on 1024 cores and it crashed. I ran the
> same job on 512 cores this morning, and it ran fine so I'm hoping the
> issue was a temporary system problem.
>

No, the memory usage is scalable.

  Thanks,

     Matt


> Notes:
>
> There is a shift in the strong scaling curve as it fills up the first
> node (i.e. from 1 to 16 cores), then it looks perfect.  The shift
> seems reasonable due to the sharing of the cache by 4 cores.
>
> The weak scaling shows slight growth in the wall clock from 6.3
> seconds to 17 seconds.  I'm going to run that again with a larger
> coarse grid in order to increase the runtime to several minutes.
>
> Graphs: https://proteus.usace.army.mil/home/pub/17/
>
> On Thu, Apr 11, 2013 at 12:46 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> > Chris Kees <cekees at gmail.com> writes:
> >
> >> Thanks a lot. I did a little example with the Bratu problem and posted
> it here:
> >>
> >> https://proteus.usace.army.mil/home/pub/17/
> >>
> >> I used boomeramg instead of geometric multigrid because I was getting
> >> an error with the options above:
> >>
> >> %mpiexec -np 4 ./ex5 -mx 129 -my 129 -Nx 2 -Ny 2 -pc_type mg
> -pc_mg_levels 2
> >> [0]PETSC ERROR: --------------------- Error Message
> >> ------------------------------------
> >> [0]PETSC ERROR: Argument out of range!
> >> [0]PETSC ERROR: New nonzero at (66,1) caused a malloc!
> >> [0]PETSC ERROR:
> >> ------------------------------------------------------------------------
> >
> > That test hard-codes evil things (presumably for testing purposes,
> > though maybe the functionality has been subsumed).  Please use
> > src/snes/examples/tutorials/ex5.c instead.
> >
> >  mpiexec -n 4 ./ex5 -da_grid_x 65 -da_grid_y 65 -pc_type mg -log_summary
> -da_refine 1
> >
> > Increase '-da_refine 1' to get higher resolution.  (This will increase
> > the number of MG levels used by PCMG.)
> >
> > Switch '-da_refine 1' to '-snes_grid_sequence 1' if you want FMG, but
> > note that it's trickier to profile because proportionately more time is
> > spent in coarse levels (although the total solve time is lower).
> >
> >>
> >> I like the ice paper and will try to get the contractor started on
> >> reproducing those results.
> >>
> >> -Chris
> >>
> >> On Wed, Apr 10, 2013 at 1:13 PM, Nystrom, William D <wdn at lanl.gov>
> wrote:
> >>> Sorry.  I overlooked that the URL was using git protocol.  My bad.
> >>>
> >>> Dave
> >>>
> >>> ________________________________________
> >>> From: Jed Brown [five9a2 at gmail.com] on behalf of Jed Brown [
> jedbrown at mcs.anl.gov]
> >>> Sent: Wednesday, April 10, 2013 12:10 PM
> >>> To: Nystrom, William D; For users of the development version of PETSc;
> Chris Kees
> >>> Subject: Re: [petsc-dev] examples/benchmarks for weak and strong
> scaling exercise
> >>>
> >>> "Nystrom, William D" <wdn at lanl.gov> writes:
> >>>
> >>>> Jed,
> >>>>
> >>>> I tried cloning your tme-ice git repo as follows and it failed:
> >>>>
> >>>> % git clone --recursive git://github.com/jedbrown/tme-ice.git tme_ice
> >>>> Cloning into 'tme_ice'...
> >>>> fatal: unable to connect to github.com:
> >>>> github.com[0: 204.232.175.90]: errno=Connection timed out
> >>>>
> >>>> I'm doing this from an xterm that allows me to clone petsc just fine.
> >>>
> >>> You're using https or ssh to clone PETSc, but the git:// to clone
> >>> tme-ice.  The LANL network is blocking that port, so just use the https
> >>> or ssh protocol.
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130412/4530ceab/attachment.html>


More information about the petsc-dev mailing list