[petsc-users] Scaling with number of cores
TAY wee-beng
zonexo at gmail.com
Tue Nov 3 06:58:23 CST 2015
On 3/11/2015 8:52 PM, Matthew Knepley wrote:
> On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi,
>
> I tried and have attached the log.
>
> Ya, my Poisson eqn has Neumann boundary condition. Do I need to
> specify some null space stuff? Like KSPSetNullSpace or
> MatNullSpaceCreate?
>
>
> Yes, you need to attach the constant null space to the matrix.
>
> Thanks,
>
> Matt
Ok so can you point me to a suitable example so that I know which one to
use specifically?
Thanks.
>
>
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 3/11/2015 12:45 PM, Barry Smith wrote:
>
> On Nov 2, 2015, at 10:37 PM, TAY wee-beng<zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi,
>
> I tried :
>
> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg
>
> 2. -poisson_pc_type gamg
>
> Run with -poisson_ksp_monitor_true_residual
> -poisson_ksp_monitor_converged_reason
> Does your poisson have Neumann boundary conditions? Do you
> have any zeros on the diagonal for the matrix (you shouldn't).
>
> There may be something wrong with your poisson
> discretization that was also messing up hypre
>
>
>
> Both options give:
>
> 1 0.00150000 0.00000000 0.00000000
> 1.00000000 NaN NaN NaN
> M Diverged but why?, time = 2
> reason = -9
>
> How can I check what's wrong?
>
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 3/11/2015 3:18 AM, Barry Smith wrote:
>
> hypre is just not scaling well here. I do not know
> why. Since hypre is a block box for us there is no way
> to determine why the poor scaling.
>
> If you make the same two runs with -pc_type gamg
> there will be a lot more information in the log
> summary about in what routines it is scaling well or
> poorly.
>
> Barry
>
>
>
> On Nov 2, 2015, at 3:17 AM, TAY
> wee-beng<zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi,
>
> I have attached the 2 files.
>
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 2/11/2015 2:55 PM, Barry Smith wrote:
>
> Run (158/2)x(266/2)x(150/2) grid on 8
> processes and then (158)x(266)x(150) on 64
> processors and send the two -log_summary results
>
> Barry
>
>
> On Nov 2, 2015, at 12:19 AM, TAY
> wee-beng<zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi,
>
> I have attached the new results.
>
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 2/11/2015 12:27 PM, Barry Smith wrote:
>
> Run without the -momentum_ksp_view
> -poisson_ksp_view and send the new results
>
>
> You can see from the log summary
> that the PCSetUp is taking a much
> smaller percentage of the time meaning
> that it is reusing the preconditioner
> and not rebuilding it each time.
>
> Barry
>
> Something makes no sense with the
> output: it gives
>
> KSPSolve 199 1.0
> 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04
> 9.9e+05 5.0e+02 90100 66100 24 90100
> 66100 24 165
>
> 90% of the time is in the solve but
> there is no significant amount of time
> in other events of the code which is
> just not possible. I hope it is due to
> your IO.
>
>
>
> On Nov 1, 2015, at 10:02 PM, TAY
> wee-beng<zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi,
>
> I have attached the new run with
> 100 time steps for 48 and 96 cores.
>
> Only the Poisson eqn 's RHS
> changes, the LHS doesn't. So if I
> want to reuse the preconditioner,
> what must I do? Or what must I not do?
>
> Why does the number of processes
> increase so much? Is there
> something wrong with my coding?
> Seems to be so too for my new run.
>
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 2/11/2015 9:49 AM, Barry Smith
> wrote:
>
> If you are doing many time
> steps with the same linear
> solver then you MUST do your
> weak scaling studies with MANY
> time steps since the setup
> time of AMG only takes place
> in the first stimestep. So run
> both 48 and 96 processes with
> the same large number of time
> steps.
>
> Barry
>
>
>
> On Nov 1, 2015, at 7:35
> PM, TAY
> wee-beng<zonexo at gmail.com
> <mailto:zonexo at gmail.com>>
> wrote:
>
> Hi,
>
> Sorry I forgot and use the
> old a.out. I have attached
> the new log for 48cores
> (log48), together with the
> 96cores log (log96).
>
> Why does the number of
> processes increase so
> much? Is there something
> wrong with my coding?
>
> Only the Poisson eqn 's
> RHS changes, the LHS
> doesn't. So if I want to
> reuse the preconditioner,
> what must I do? Or what
> must I not do?
>
> Lastly, I only simulated 2
> time steps previously. Now
> I run for 10 timesteps
> (log48_10). Is it building
> the preconditioner at
> every timestep?
>
> Also, what about momentum
> eqn? Is it working well?
>
> I will try the gamg later too.
>
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 2/11/2015 12:30 AM,
> Barry Smith wrote:
>
> You used gmres with
> 48 processes but
> richardson with 96.
> You need to be careful
> and make sure you
> don't change the
> solvers when you
> change the number of
> processors since you
> can get very different
> inconsistent results
>
> Anyways all the
> time is being spent in
> the BoomerAMG
> algebraic multigrid
> setup and it is is
> scaling badly. When
> you double the problem
> size and number of
> processes it went from
> 3.2445e+01 to
> 4.3599e+02 seconds.
>
> PCSetUp
> 3 1.0 3.2445e+01 1.0
> 9.58e+06 2.0 0.0e+00
> 0.0e+00 4.0e+00 62 8
> 0 0 4 62 8 0 0
> 5 11
>
> PCSetUp
> 3 1.0 4.3599e+02 1.0
> 9.58e+06 2.0 0.0e+00
> 0.0e+00 4.0e+00 85 18
> 0 0 6 85 18 0 0
> 6 2
>
> Now is the Poisson
> problem changing at
> each timestep or can
> you use the same
> preconditioner built
> with BoomerAMG for all
> the time steps?
> Algebraic multigrid
> has a large set up
> time that you often
> doesn't matter if you
> have many time steps
> but if you have to
> rebuild it each
> timestep it is too large?
>
> You might also try
> -pc_type gamg and see
> how PETSc's algebraic
> multigrid scales for
> your problem/machine.
>
> Barry
>
>
>
> On Nov 1, 2015, at
> 7:30 AM, TAY
> wee-beng<zonexo at gmail.com
> <mailto:zonexo at gmail.com>>
> wrote:
>
>
> On 1/11/2015 10:00
> AM, Barry Smith wrote:
>
> On Oct 31,
> 2015, at
> 8:43 PM,
> TAY
> wee-beng<zonexo at gmail.com
> <mailto:zonexo at gmail.com>>
> wrote:
>
>
> On
> 1/11/2015
> 12:47 AM,
> Matthew
> Knepley wrote:
>
> On
> Sat,
> Oct
> 31,
> 2015
> at
> 11:34
> AM,
> TAY
> wee-beng<zonexo at gmail.com
> <mailto:zonexo at gmail.com>>
> wrote:
> Hi,
>
> I
> understand
> that
> as
> mentioned
> in the
> faq,
> due to
> the
> limitations
> in
> memory, the
> scaling is
> not
> linear. So,
> I am
> trying
> to
> write
> a
> proposal
> to use
> a
> supercomputer.
> Its
> specs are:
> Compute nodes:
> 82,944
> nodes
> (SPARC64
> VIIIfx; 16GB
> of
> memory
> per node)
>
> 8
> cores
> /
> processor
> Interconnect:
> Tofu
> (6-dimensional
> mesh/torus)
> Interconnect
> Each
> cabinet contains
> 96
> computing
> nodes,
> One of
> the
> requirement
> is to
> give
> the
> performance
> of my
> current code
> with
> my
> current set
> of
> data,
> and
> there
> is a
> formula to
> calculate
> the
> estimated
> parallel
> efficiency
> when
> using
> the
> new
> large
> set of
> data
> There
> are 2
> ways
> to
> give
> performance:
> 1.
> Strong
> scaling,
> which
> is
> defined as
> how
> the
> elapsed time
> varies
> with
> the
> number
> of
> processors
> for a
> fixed
> problem.
> 2.
> Weak
> scaling,
> which
> is
> defined as
> how
> the
> elapsed time
> varies
> with
> the
> number
> of
> processors
> for a
> fixed
> problem size
> per
> processor.
> I ran
> my
> cases
> with
> 48 and
> 96
> cores
> with
> my
> current cluster,
> giving
> 140
> and 90
> mins
> respectively.
> This
> is
> classified
> as
> strong
> scaling.
> Cluster specs:
> CPU:
> AMD
> 6234
> 2.4GHz
> 8
> cores
> /
> processor
> (CPU)
> 6 CPU
> / node
> So 48
> Cores
> / CPU
> Not
> sure
> abt
> the
> memory
> / node
>
> The
> parallel
> efficiency
> ‘En’
> for a
> given
> degree
> of
> parallelism
> ‘n’
> indicates
> how
> much
> the
> program is
> efficiently
> accelerated
> by
> parallel
> processing.
> ‘En’
> is
> given
> by the
> following
> formulae.
> Although
> their
> derivation
> processes
> are
> different
> depending
> on
> strong
> and
> weak
> scaling,
> derived formulae
> are the
> same.
> From
> the
> estimated
> time,
> my
> parallel
> efficiency
> using
> Amdahl's
> law on
> the
> current old
> cluster was
> 52.7%.
> So is
> my
> results acceptable?
> For
> the
> large
> data
> set,
> if
> using
> 2205
> nodes
> (2205X8cores),
> my
> expected
> parallel
> efficiency
> is
> only
> 0.5%.
> The
> proposal
> recommends
> value
> of > 50%.
> The
> problem with
> this
> analysis
> is
> that
> the
> estimated
> serial
> fraction
> from
> Amdahl's
> Law
> changes as
> a function
> of
> problem size,
> so you
> cannot
> take
> the
> strong
> scaling from
> one
> problem and
> apply
> it to
> another without
> a
> model
> of
> this
> dependence.
>
> Weak
> scaling does
> model
> changes with
> problem size,
> so I
> would
> measure weak
> scaling on
> your
> current
> cluster,
> and
> extrapolate
> to the
> big
> machine.
> I
> realize that
> this
> does
> not
> make
> sense
> for
> many
> scientific
> applications,
> but
> neither does
> requiring
> a
> certain parallel
> efficiency.
>
> Ok I check
> the
> results
> for my
> weak
> scaling it
> is even
> worse for
> the
> expected
> parallel
> efficiency. From
> the
> formula
> used, it's
> obvious
> it's doing
> some sort
> of
> exponential extrapolation
> decrease.
> So unless
> I can
> achieve a
> near > 90%
> speed up
> when I
> double the
> cores and
> problem
> size for
> my current
> 48/96
> cores
> setup,
> extrapolating
> from about
> 96 nodes
> to 10,000
> nodes will
> give a
> much lower
> expected
> parallel
> efficiency
> for the
> new case.
>
> However,
> it's
> mentioned
> in the FAQ
> that due
> to memory
> requirement,
> it's
> impossible
> to get
> >90% speed
> when I
> double the
> cores and
> problem
> size (ie
> linear
> increase
> in
> performance),
> which
> means that
> I can't
> get >90%
> speed up
> when I
> double the
> cores and
> problem
> size for
> my current
> 48/96
> cores
> setup. Is
> that so?
>
> What is the
> output of
> -ksp_view
> -log_summary
> on the problem
> and then on
> the problem
> doubled in
> size and
> number of
> processors?
>
> Barry
>
> Hi,
>
> I have attached
> the output
>
> 48 cores: log48
> 96 cores: log96
>
> There are 2
> solvers - The
> momentum linear
> eqn uses bcgs,
> while the Poisson
> eqn uses hypre
> BoomerAMG.
>
> Problem size
> doubled from
> 158x266x150 to
> 158x266x300.
>
> So is it
> fair to
> say that
> the main
> problem
> does not
> lie in my
> programming skills,
> but rather
> the way
> the linear
> equations
> are solved?
>
> Thanks.
>
> Thanks,
>
> Matt
> Is it
> possible
> for
> this
> type
> of
> scaling in
> PETSc
> (>50%), when
> using
> 17640
> (2205X8)
> cores?
> Btw, I
> do not
> have
> access
> to the
> system.
>
>
>
> Sent
> using
> CloudMagic
> Email
>
>
>
> --
> What
> most
> experimenters
> take
> for
> granted before
> they
> begin
> their
> experiments
> is
> infinitely
> more
> interesting
> than
> any
> results to
> which
> their
> experiments
> lead.
> --
> Norbert Wiener
>
> <log48.txt><log96.txt>
>
> <log48_10.txt><log48.txt><log96.txt>
>
> <log96_100.txt><log48_100.txt>
>
> <log96_100_2.txt><log48_100_2.txt>
>
> <log64_100.txt><log8_100.txt>
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151103/fafa25af/attachment-0001.html>
More information about the petsc-users
mailing list