[petsc-users] Scaling with number of cores

TAY wee-beng zonexo at gmail.com
Tue Nov 3 09:04:56 CST 2015


On 3/11/2015 9:01 PM, Matthew Knepley wrote:
> On Tue, Nov 3, 2015 at 6:58 AM, TAY wee-beng <zonexo at gmail.com 
> <mailto:zonexo at gmail.com>> wrote:
>
>
>     On 3/11/2015 8:52 PM, Matthew Knepley wrote:
>>     On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng <zonexo at gmail.com
>>     <mailto:zonexo at gmail.com>> wrote:
>>
>>         Hi,
>>
>>         I tried and have attached the log.
>>
>>         Ya, my Poisson eqn has Neumann boundary condition. Do I need
>>         to specify some null space stuff?  Like KSPSetNullSpace or
>>         MatNullSpaceCreate?
>>
>>
>>     Yes, you need to attach the constant null space to the matrix.
>>
>>       Thanks,
>>
>>          Matt
>     Ok so can you point me to a suitable example so that I know which
>     one to use specifically?
>
>
> https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761
>
>   Matt
Hi,

Actually, I realised that for my Poisson eqn, I have neumann and 
dirichlet BC. Dirichlet BC is at the output grids by specifying pressure 
= 0. So do I still need the null space?

My Poisson eqn LHS is fixed but RHS is changing with every timestep.

If I need to use null space, how do I know if the null space contains 
the constant vector and what the the no. of vectors? I follow the 
example given and added:

call MatNullSpaceCreate(MPI_COMM_WORLD,PETSC_TRUE,0,NULL,nullsp,ierr)

     call MatSetNullSpace(A,nullsp,ierr)

     call MatNullSpaceDestroy(nullsp,ierr)

Is that all?

Before this, I was using HYPRE geometric solver and the matrix / vector 
in the subroutine was written based on HYPRE. It worked pretty well and 
fast.

However, it's a black box and it's hard to diagnose problems.

I always had the PETSc subroutine to solve my Poisson eqn but I used 
KSPBCGS or KSPGMRES with HYPRE's boomeramg as the PC. It worked but was 
slow.

Matt: Thanks, I will see how it goes using the nullspace and may try 
"/-mg_coarse_pc_type svd/" later.
>
>     Thanks.
>>
>>
>>         Thank you
>>
>>         Yours sincerely,
>>
>>         TAY wee-beng
>>
>>         On 3/11/2015 12:45 PM, Barry Smith wrote:
>>
>>                 On Nov 2, 2015, at 10:37 PM, TAY
>>                 wee-beng<zonexo at gmail.com <mailto:zonexo at gmail.com>>
>>                 wrote:
>>
>>                 Hi,
>>
>>                 I tried :
>>
>>                 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg
>>
>>                 2. -poisson_pc_type gamg
>>
>>                 Run with -poisson_ksp_monitor_true_residual
>>             -poisson_ksp_monitor_converged_reason
>>             Does your poisson have Neumann boundary conditions? Do
>>             you have any zeros on the diagonal for the matrix (you
>>             shouldn't).
>>
>>                There may be something wrong with your poisson
>>             discretization that was also messing up hypre
>>
>>
>>
>>                 Both options give:
>>
>>                     1      0.00150000      0.00000000     0.00000000
>>                 1.00000000  NaN             NaN             NaN
>>                 M Diverged but why?, time = 2
>>                 reason =           -9
>>
>>                 How can I check what's wrong?
>>
>>                 Thank you
>>
>>                 Yours sincerely,
>>
>>                 TAY wee-beng
>>
>>                 On 3/11/2015 3:18 AM, Barry Smith wrote:
>>
>>                         hypre is just not scaling well here. I do not
>>                     know why. Since hypre is a block box for us there
>>                     is no way to determine why the poor scaling.
>>
>>                         If you make the same two runs with -pc_type
>>                     gamg there will be a lot more information in the
>>                     log summary about in what routines it is scaling
>>                     well or poorly.
>>
>>                        Barry
>>
>>
>>
>>                         On Nov 2, 2015, at 3:17 AM, TAY
>>                         wee-beng<zonexo at gmail.com
>>                         <mailto:zonexo at gmail.com>> wrote:
>>
>>                         Hi,
>>
>>                         I have attached the 2 files.
>>
>>                         Thank you
>>
>>                         Yours sincerely,
>>
>>                         TAY wee-beng
>>
>>                         On 2/11/2015 2:55 PM, Barry Smith wrote:
>>
>>                                Run (158/2)x(266/2)x(150/2) grid on 8
>>                             processes  and then (158)x(266)x(150) on
>>                             64 processors  and send the two
>>                             -log_summary results
>>
>>                                Barry
>>
>>
>>                                 On Nov 2, 2015, at 12:19 AM, TAY
>>                                 wee-beng<zonexo at gmail.com
>>                                 <mailto:zonexo at gmail.com>> wrote:
>>
>>                                 Hi,
>>
>>                                 I have attached the new results.
>>
>>                                 Thank you
>>
>>                                 Yours sincerely,
>>
>>                                 TAY wee-beng
>>
>>                                 On 2/11/2015 12:27 PM, Barry Smith wrote:
>>
>>                                        Run without the
>>                                     -momentum_ksp_view
>>                                     -poisson_ksp_view and send the
>>                                     new results
>>
>>
>>                                        You can see from the log
>>                                     summary that the PCSetUp is
>>                                     taking a much smaller percentage
>>                                     of the time meaning that it is
>>                                     reusing the preconditioner and
>>                                     not rebuilding it each time.
>>
>>                                     Barry
>>
>>                                        Something makes no sense with
>>                                     the output: it gives
>>
>>                                     KSPSolve             199 1.0
>>                                     2.3298e+03 1.0 5.20e+09 1.8
>>                                     3.8e+04 9.9e+05 5.0e+02 90100
>>                                     66100 24  90100 66100 24   165
>>
>>                                     90% of the time is in the solve
>>                                     but there is no significant
>>                                     amount of time in other events of
>>                                     the code which is just not
>>                                     possible. I hope it is due to
>>                                     your IO.
>>
>>
>>
>>                                         On Nov 1, 2015, at 10:02 PM,
>>                                         TAY wee-beng<zonexo at gmail.com
>>                                         <mailto:zonexo at gmail.com>> wrote:
>>
>>                                         Hi,
>>
>>                                         I have attached the new run
>>                                         with 100 time steps for 48
>>                                         and 96 cores.
>>
>>                                         Only the Poisson eqn 's RHS
>>                                         changes, the LHS doesn't. So
>>                                         if I want to reuse the
>>                                         preconditioner, what must I
>>                                         do? Or what must I not do?
>>
>>                                         Why does the number of
>>                                         processes increase so much?
>>                                         Is there something wrong with
>>                                         my coding? Seems to be so too
>>                                         for my new run.
>>
>>                                         Thank you
>>
>>                                         Yours sincerely,
>>
>>                                         TAY wee-beng
>>
>>                                         On 2/11/2015 9:49 AM, Barry
>>                                         Smith wrote:
>>
>>                                                If you are doing many
>>                                             time steps with the same
>>                                             linear solver then you
>>                                             MUST do your weak scaling
>>                                             studies with MANY time
>>                                             steps since the setup
>>                                             time of AMG only takes
>>                                             place in the first
>>                                             stimestep. So run both 48
>>                                             and 96 processes with the
>>                                             same large number of time
>>                                             steps.
>>
>>                                                Barry
>>
>>
>>
>>                                                 On Nov 1, 2015, at
>>                                                 7:35 PM, TAY
>>                                                 wee-beng<zonexo at gmail.com
>>                                                 <mailto:zonexo at gmail.com>>
>>                                                 wrote:
>>
>>                                                 Hi,
>>
>>                                                 Sorry I forgot and
>>                                                 use the old a.out. I
>>                                                 have attached the new
>>                                                 log for 48cores
>>                                                 (log48), together
>>                                                 with the 96cores log
>>                                                 (log96).
>>
>>                                                 Why does the number
>>                                                 of processes increase
>>                                                 so much? Is there
>>                                                 something wrong with
>>                                                 my coding?
>>
>>                                                 Only the Poisson eqn
>>                                                 's RHS changes, the
>>                                                 LHS doesn't. So if I
>>                                                 want to reuse the
>>                                                 preconditioner, what
>>                                                 must I do? Or what
>>                                                 must I not do?
>>
>>                                                 Lastly, I only
>>                                                 simulated 2 time
>>                                                 steps previously. Now
>>                                                 I run for 10
>>                                                 timesteps (log48_10).
>>                                                 Is it building the
>>                                                 preconditioner at
>>                                                 every timestep?
>>
>>                                                 Also, what about
>>                                                 momentum eqn? Is it
>>                                                 working well?
>>
>>                                                 I will try the gamg
>>                                                 later too.
>>
>>                                                 Thank you
>>
>>                                                 Yours sincerely,
>>
>>                                                 TAY wee-beng
>>
>>                                                 On 2/11/2015 12:30
>>                                                 AM, Barry Smith wrote:
>>
>>                                                        You used gmres
>>                                                     with 48 processes
>>                                                     but richardson
>>                                                     with 96. You need
>>                                                     to be careful and
>>                                                     make sure you
>>                                                     don't change the
>>                                                     solvers when you
>>                                                     change the number
>>                                                     of processors
>>                                                     since you can get
>>                                                     very different
>>                                                     inconsistent results
>>
>>                                                         Anyways all
>>                                                     the time is being
>>                                                     spent in the
>>                                                     BoomerAMG
>>                                                     algebraic
>>                                                     multigrid setup
>>                                                     and it is is
>>                                                     scaling badly.
>>                                                     When you double
>>                                                     the problem size
>>                                                     and number of
>>                                                     processes it went
>>                                                     from 3.2445e+01
>>                                                     to 4.3599e+02
>>                                                     seconds.
>>
>>                                                     PCSetUp   3 1.0
>>                                                     3.2445e+01 1.0
>>                                                     9.58e+06 2.0
>>                                                     0.0e+00 0.0e+00
>>                                                     4.0e+00 62  8  0 
>>                                                     0 4  62  8  0  0 
>>                                                     5 11
>>
>>                                                     PCSetUp   3 1.0
>>                                                     4.3599e+02 1.0
>>                                                     9.58e+06 2.0
>>                                                     0.0e+00 0.0e+00
>>                                                     4.0e+00 85 18  0 
>>                                                     0 6  85 18  0  0 
>>                                                     6  2
>>
>>                                                        Now is the
>>                                                     Poisson problem
>>                                                     changing at each
>>                                                     timestep or can
>>                                                     you use the same
>>                                                     preconditioner
>>                                                     built with
>>                                                     BoomerAMG for all
>>                                                     the time steps?
>>                                                     Algebraic
>>                                                     multigrid has a
>>                                                     large set up time
>>                                                     that you often
>>                                                     doesn't matter if
>>                                                     you have many
>>                                                     time steps but if
>>                                                     you have to
>>                                                     rebuild it each
>>                                                     timestep it is
>>                                                     too large?
>>
>>                                                        You might also
>>                                                     try -pc_type gamg
>>                                                     and see how
>>                                                     PETSc's algebraic
>>                                                     multigrid scales
>>                                                     for your
>>                                                     problem/machine.
>>
>>                                                        Barry
>>
>>
>>
>>                                                         On Nov 1,
>>                                                         2015, at 7:30
>>                                                         AM, TAY
>>                                                         wee-beng<zonexo at gmail.com
>>                                                         <mailto:zonexo at gmail.com>>
>>                                                         wrote:
>>
>>
>>                                                         On 1/11/2015
>>                                                         10:00 AM,
>>                                                         Barry Smith
>>                                                         wrote:
>>
>>                                                                 On
>>                                                                 Oct
>>                                                                 31,
>>                                                                 2015,
>>                                                                 at
>>                                                                 8:43
>>                                                                 PM,
>>                                                                 TAY
>>                                                                 wee-beng<zonexo at gmail.com
>>                                                                 <mailto:zonexo at gmail.com>>
>>                                                                 wrote:
>>
>>
>>                                                                 On
>>                                                                 1/11/2015
>>                                                                 12:47
>>                                                                 AM,
>>                                                                 Matthew
>>                                                                 Knepley
>>                                                                 wrote:
>>
>>                                                                     On Sat,
>>                                                                     Oct
>>                                                                     31,
>>                                                                     2015
>>                                                                     at 11:34
>>                                                                     AM,
>>                                                                     TAY
>>                                                                     wee-beng<zonexo at gmail.com
>>                                                                     <mailto:zonexo at gmail.com>>
>>                                                                     wrote:
>>                                                                     Hi,
>>
>>                                                                     I
>>                                                                     understand
>>                                                                     that
>>                                                                     as mentioned
>>                                                                     in the
>>                                                                     faq,
>>                                                                     due
>>                                                                     to the
>>                                                                     limitations
>>                                                                     in memory,
>>                                                                     the
>>                                                                     scaling
>>                                                                     is not
>>                                                                     linear.
>>                                                                     So,
>>                                                                     I
>>                                                                     am trying
>>                                                                     to write
>>                                                                     a
>>                                                                     proposal
>>                                                                     to use
>>                                                                     a
>>                                                                     supercomputer.
>>                                                                     Its
>>                                                                     specs
>>                                                                     are:
>>                                                                     Compute
>>                                                                     nodes:
>>                                                                     82,944
>>                                                                     nodes
>>                                                                     (SPARC64
>>                                                                     VIIIfx;
>>                                                                     16GB
>>                                                                     of memory
>>                                                                     per
>>                                                                     node)
>>
>>                                                                     8
>>                                                                     cores
>>                                                                     /
>>                                                                     processor
>>                                                                     Interconnect:
>>                                                                     Tofu
>>                                                                     (6-dimensional
>>                                                                     mesh/torus)
>>                                                                     Interconnect
>>                                                                     Each
>>                                                                     cabinet
>>                                                                     contains
>>                                                                     96 computing
>>                                                                     nodes,
>>                                                                     One
>>                                                                     of the
>>                                                                     requirement
>>                                                                     is to
>>                                                                     give
>>                                                                     the
>>                                                                     performance
>>                                                                     of my
>>                                                                     current
>>                                                                     code
>>                                                                     with
>>                                                                     my current
>>                                                                     set
>>                                                                     of data,
>>                                                                     and
>>                                                                     there
>>                                                                     is a
>>                                                                     formula
>>                                                                     to calculate
>>                                                                     the
>>                                                                     estimated
>>                                                                     parallel
>>                                                                     efficiency
>>                                                                     when
>>                                                                     using
>>                                                                     the
>>                                                                     new
>>                                                                     large
>>                                                                     set
>>                                                                     of data
>>                                                                     There
>>                                                                     are
>>                                                                     2
>>                                                                     ways
>>                                                                     to give
>>                                                                     performance:
>>                                                                     1. Strong
>>                                                                     scaling,
>>                                                                     which
>>                                                                     is defined
>>                                                                     as how
>>                                                                     the
>>                                                                     elapsed
>>                                                                     time
>>                                                                     varies
>>                                                                     with
>>                                                                     the
>>                                                                     number
>>                                                                     of processors
>>                                                                     for
>>                                                                     a
>>                                                                     fixed
>>                                                                     problem.
>>                                                                     2. Weak
>>                                                                     scaling,
>>                                                                     which
>>                                                                     is defined
>>                                                                     as how
>>                                                                     the
>>                                                                     elapsed
>>                                                                     time
>>                                                                     varies
>>                                                                     with
>>                                                                     the
>>                                                                     number
>>                                                                     of processors
>>                                                                     for a
>>                                                                     fixed
>>                                                                     problem
>>                                                                     size
>>                                                                     per
>>                                                                     processor.
>>                                                                     I
>>                                                                     ran
>>                                                                     my cases
>>                                                                     with
>>                                                                     48 and
>>                                                                     96 cores
>>                                                                     with
>>                                                                     my current
>>                                                                     cluster,
>>                                                                     giving
>>                                                                     140
>>                                                                     and
>>                                                                     90 mins
>>                                                                     respectively.
>>                                                                     This
>>                                                                     is classified
>>                                                                     as strong
>>                                                                     scaling.
>>                                                                     Cluster
>>                                                                     specs:
>>                                                                     CPU:
>>                                                                     AMD
>>                                                                     6234
>>                                                                     2.4GHz
>>                                                                     8
>>                                                                     cores
>>                                                                     /
>>                                                                     processor
>>                                                                     (CPU)
>>                                                                     6
>>                                                                     CPU
>>                                                                     /
>>                                                                     node
>>                                                                     So 48
>>                                                                     Cores
>>                                                                     / CPU
>>                                                                     Not
>>                                                                     sure
>>                                                                     abt
>>                                                                     the
>>                                                                     memory
>>                                                                     /
>>                                                                     node
>>
>>                                                                     The
>>                                                                     parallel
>>                                                                     efficiency
>>                                                                     ‘En’
>>                                                                     for
>>                                                                     a
>>                                                                     given
>>                                                                     degree
>>                                                                     of parallelism
>>                                                                     ‘n’
>>                                                                     indicates
>>                                                                     how
>>                                                                     much
>>                                                                     the
>>                                                                     program
>>                                                                     is
>>                                                                     efficiently
>>                                                                     accelerated
>>                                                                     by parallel
>>                                                                     processing.
>>                                                                     ‘En’
>>                                                                     is given
>>                                                                     by the
>>                                                                     following
>>                                                                     formulae.
>>                                                                     Although
>>                                                                     their
>>                                                                     derivation
>>                                                                     processes
>>                                                                     are
>>                                                                     different
>>                                                                     depending
>>                                                                     on strong
>>                                                                     and
>>                                                                     weak
>>                                                                     scaling,
>>                                                                     derived
>>                                                                     formulae
>>                                                                     are
>>                                                                     the
>>                                                                     same.
>>                                                                      From
>>                                                                     the
>>                                                                     estimated
>>                                                                     time,
>>                                                                     my parallel
>>                                                                     efficiency
>>                                                                     using
>>                                                                     Amdahl's
>>                                                                     law
>>                                                                     on the
>>                                                                     current
>>                                                                     old
>>                                                                     cluster
>>                                                                     was
>>                                                                     52.7%.
>>                                                                     So is
>>                                                                     my results
>>                                                                     acceptable?
>>                                                                     For
>>                                                                     the
>>                                                                     large
>>                                                                     data
>>                                                                     set,
>>                                                                     if using
>>                                                                     2205
>>                                                                     nodes
>>                                                                     (2205X8cores),
>>                                                                     my expected
>>                                                                     parallel
>>                                                                     efficiency
>>                                                                     is only
>>                                                                     0.5%.
>>                                                                     The
>>                                                                     proposal
>>                                                                     recommends
>>                                                                     value
>>                                                                     of >
>>                                                                     50%.
>>                                                                     The
>>                                                                     problem
>>                                                                     with
>>                                                                     this
>>                                                                     analysis
>>                                                                     is that
>>                                                                     the
>>                                                                     estimated
>>                                                                     serial
>>                                                                     fraction
>>                                                                     from
>>                                                                     Amdahl's
>>                                                                     Law
>>                                                                     changes
>>                                                                     as a
>>                                                                     function
>>                                                                     of problem
>>                                                                     size,
>>                                                                     so you
>>                                                                     cannot
>>                                                                     take
>>                                                                     the
>>                                                                     strong
>>                                                                     scaling
>>                                                                     from
>>                                                                     one
>>                                                                     problem
>>                                                                     and
>>                                                                     apply
>>                                                                     it to
>>                                                                     another
>>                                                                     without
>>                                                                     a
>>                                                                     model
>>                                                                     of this
>>                                                                     dependence.
>>
>>                                                                     Weak
>>                                                                     scaling
>>                                                                     does
>>                                                                     model
>>                                                                     changes
>>                                                                     with
>>                                                                     problem
>>                                                                     size,
>>                                                                     so I
>>                                                                     would
>>                                                                     measure
>>                                                                     weak
>>                                                                     scaling
>>                                                                     on your
>>                                                                     current
>>                                                                     cluster,
>>                                                                     and
>>                                                                     extrapolate
>>                                                                     to the
>>                                                                     big
>>                                                                     machine.
>>                                                                     I
>>                                                                     realize
>>                                                                     that
>>                                                                     this
>>                                                                     does
>>                                                                     not
>>                                                                     make
>>                                                                     sense
>>                                                                     for
>>                                                                     many
>>                                                                     scientific
>>                                                                     applications,
>>                                                                     but
>>                                                                     neither
>>                                                                     does
>>                                                                     requiring
>>                                                                     a
>>                                                                     certain
>>                                                                     parallel
>>                                                                     efficiency.
>>
>>                                                                 Ok I
>>                                                                 check
>>                                                                 the
>>                                                                 results
>>                                                                 for
>>                                                                 my
>>                                                                 weak
>>                                                                 scaling
>>                                                                 it is
>>                                                                 even
>>                                                                 worse
>>                                                                 for
>>                                                                 the
>>                                                                 expected
>>                                                                 parallel
>>                                                                 efficiency.
>>                                                                 From
>>                                                                 the
>>                                                                 formula
>>                                                                 used,
>>                                                                 it's
>>                                                                 obvious
>>                                                                 it's
>>                                                                 doing
>>                                                                 some
>>                                                                 sort
>>                                                                 of
>>                                                                 exponential
>>                                                                 extrapolation
>>                                                                 decrease.
>>                                                                 So
>>                                                                 unless I
>>                                                                 can
>>                                                                 achieve
>>                                                                 a
>>                                                                 near
>>                                                                 > 90%
>>                                                                 speed
>>                                                                 up
>>                                                                 when
>>                                                                 I
>>                                                                 double the
>>                                                                 cores
>>                                                                 and
>>                                                                 problem
>>                                                                 size
>>                                                                 for
>>                                                                 my
>>                                                                 current
>>                                                                 48/96
>>                                                                 cores
>>                                                                 setup,  extrapolating
>>                                                                 from
>>                                                                 about
>>                                                                 96
>>                                                                 nodes
>>                                                                 to
>>                                                                 10,000 nodes
>>                                                                 will
>>                                                                 give
>>                                                                 a
>>                                                                 much
>>                                                                 lower
>>                                                                 expected
>>                                                                 parallel
>>                                                                 efficiency
>>                                                                 for
>>                                                                 the
>>                                                                 new case.
>>
>>                                                                 However,
>>                                                                 it's
>>                                                                 mentioned
>>                                                                 in
>>                                                                 the
>>                                                                 FAQ
>>                                                                 that
>>                                                                 due
>>                                                                 to
>>                                                                 memory requirement,
>>                                                                 it's
>>                                                                 impossible
>>                                                                 to
>>                                                                 get
>>                                                                 >90%
>>                                                                 speed
>>                                                                 when
>>                                                                 I
>>                                                                 double the
>>                                                                 cores
>>                                                                 and
>>                                                                 problem
>>                                                                 size
>>                                                                 (ie
>>                                                                 linear increase
>>                                                                 in
>>                                                                 performance),
>>                                                                 which
>>                                                                 means
>>                                                                 that
>>                                                                 I
>>                                                                 can't
>>                                                                 get
>>                                                                 >90%
>>                                                                 speed
>>                                                                 up
>>                                                                 when
>>                                                                 I
>>                                                                 double the
>>                                                                 cores
>>                                                                 and
>>                                                                 problem
>>                                                                 size
>>                                                                 for
>>                                                                 my
>>                                                                 current
>>                                                                 48/96
>>                                                                 cores
>>                                                                 setup. Is
>>                                                                 that so?
>>
>>                                                                What
>>                                                             is the
>>                                                             output of
>>                                                             -ksp_view
>>                                                             -log_summary
>>                                                             on the
>>                                                             problem
>>                                                             and then
>>                                                             on the
>>                                                             problem
>>                                                             doubled
>>                                                             in size
>>                                                             and
>>                                                             number of
>>                                                             processors?
>>
>>                                                                Barry
>>
>>                                                         Hi,
>>
>>                                                         I have
>>                                                         attached the
>>                                                         output
>>
>>                                                         48 cores: log48
>>                                                         96 cores: log96
>>
>>                                                         There are 2
>>                                                         solvers - The
>>                                                         momentum
>>                                                         linear eqn
>>                                                         uses bcgs,
>>                                                         while the
>>                                                         Poisson eqn
>>                                                         uses hypre
>>                                                         BoomerAMG.
>>
>>                                                         Problem size
>>                                                         doubled from
>>                                                         158x266x150
>>                                                         to 158x266x300.
>>
>>                                                                 So is
>>                                                                 it
>>                                                                 fair
>>                                                                 to
>>                                                                 say
>>                                                                 that
>>                                                                 the
>>                                                                 main
>>                                                                 problem
>>                                                                 does
>>                                                                 not
>>                                                                 lie
>>                                                                 in my
>>                                                                 programming
>>                                                                 skills,
>>                                                                 but
>>                                                                 rather the
>>                                                                 way
>>                                                                 the
>>                                                                 linear equations
>>                                                                 are
>>                                                                 solved?
>>
>>                                                                 Thanks.
>>
>>                                                                      
>>                                                                      Thanks,
>>
>>                                                                      
>>                                                                      
>>                                                                      
>>                                                                     Matt
>>                                                                     Is it
>>                                                                     possible
>>                                                                     for
>>                                                                     this
>>                                                                     type
>>                                                                     of scaling
>>                                                                     in PETSc
>>                                                                     (>50%),
>>                                                                     when
>>                                                                     using
>>                                                                     17640
>>                                                                     (2205X8)
>>                                                                     cores?
>>                                                                     Btw,
>>                                                                     I
>>                                                                     do not
>>                                                                     have
>>                                                                     access
>>                                                                     to the
>>                                                                     system.
>>
>>
>>
>>                                                                     Sent
>>                                                                     using
>>                                                                     CloudMagic
>>                                                                     Email
>>
>>
>>
>>                                                                     -- 
>>                                                                     What
>>                                                                     most
>>                                                                     experimenters
>>                                                                     take
>>                                                                     for
>>                                                                     granted
>>                                                                     before
>>                                                                     they
>>                                                                     begin
>>                                                                     their
>>                                                                     experiments
>>                                                                     is infinitely
>>                                                                     more
>>                                                                     interesting
>>                                                                     than
>>                                                                     any
>>                                                                     results
>>                                                                     to which
>>                                                                     their
>>                                                                     experiments
>>                                                                     lead.
>>                                                                     -- Norbert
>>                                                                     Wiener
>>
>>                                                         <log48.txt><log96.txt>
>>
>>                                                 <log48_10.txt><log48.txt><log96.txt>
>>
>>                                         <log96_100.txt><log48_100.txt>
>>
>>                                 <log96_100_2.txt><log48_100_2.txt>
>>
>>                         <log64_100.txt><log8_100.txt>
>>
>>
>>
>>
>>
>>     -- 
>>     What most experimenters take for granted before they begin their
>>     experiments is infinitely more interesting than any results to
>>     which their experiments lead.
>>     -- Norbert Wiener
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151103/5df47888/attachment-0001.html>


More information about the petsc-users mailing list