A 3D example of KSPSolve?
Barry Smith
bsmith at mcs.anl.gov
Fri Feb 9 23:07:11 CST 2007
Shi,
The lack of good scaling is coming from two important sources.
1) The MPI on this system is terrible
Average time to get PetscTime(): 1.71661e-06
Average time for MPI_Barrier(): 0.008253
Average time for zero size MPI_Send(): 0.000279441
you want to see numbers like 1.e-5 to 1.e-6 instead of 1e-3 to 1e-4
2) The number of iterations for the linear systems is growing too rapidly
with more processes. For example in stage 8 it goes from 1782 iterations on 1
process to 3267 on 16 processors.
3) a lessor effect is from a slight inbalance in work between processes,
for example in stage 8 the slowest MatSolve is 1.3 times the fastest.
Initial suggestions.
0) Get rid of the MatGetRows()
1) it appears your matrices are symmetric? If so, you can use MATMPISBAIJ
instead of AIJ, then you can use (incomplete) Cholesky on the blocks.
2) Try using ASM instead of block Jacobi as the preconditioner. Use
-pc_type asm -pc_asm_type basic -sub_pc_type icc
this will decrease the number of iterations in parallel at the cost of more
expensive iterations so it may help or may not.
3) Try using hypre's boomeramg for some (poisson?) (all?) of the solves.
config/configure.py PETSc with --download-hypre and run with
-pc_type hypre -pc_hypre_type boomeramg (if you run this with -help
it will show a large number of tuneable options that can really speed things
up.)
Final note: I would not expect to EVER see more than a speed up of more then
say 10 to 12 on this machine, no matter how good the linear solver; due to the
slowness of the network. But on a really good network you "might" be able to
get 13 or 14 with hypre boomeramg.
Barry
On Fri, 9 Feb 2007, Shi Jin wrote:
> Sorry that is not informative.
> So I decide to attach the 5 files for NP=1,2,4,8,16
> for
> the 400,000 finite element case.
>
> Please note that the simulation runs over 100 steps.
> The 1st step is first order update, named as stage 1.
> The rest 99 steps are second order updates. Within
> that, stage 2-9 are created for the 8 stages of a
> second order update. We should concentrate on the
> second order updates. So four calls to KSPSolve in the
> log file are important, in stage 4,5,6,and 8
> separately.
> Pleaes let me know if you need any other information
> or explanation.
> Thank you very much.
>
> Shi
> --- Matthew Knepley <knepley at gmail.com> wrote:
>
> > You really have to give us the log summary output.
> > None of the relevant
> > numbers are in your summary.
> >
> > Thanks,
> >
> > Matt
> >
> > On 2/9/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> > >
> > > Dear Barry,
> > >
> > > Thank you.
> > > I actually have done the staging already.
> > > I summarized the timing of the runs in google
> > online
> > > spreadsheets. I have two runs.
> > > 1. with 400,000 finite elements:
> > >
> >
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZeDZlucTjEIA
> > > 2. with 1,600,000 finite elements:
> > >
> >
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZcCVLAqmzqQQ
> > >
> > > If you can take a look at them and give me some
> > > advice, I will be deeply grateful.
> > >
> > > Shi
> > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > > >
> > > > NO, NO, don't spend time stripping your code!
> > > > Unproductive
> > > >
> > > > See the manul pages for
> > PetscLogStageRegister(),
> > > > PetscLogStagePush() and
> > > > PetscLogStagePop(). All you need to do is
> > maintain a
> > > > seperate stage for each
> > > > of your KSPSolves; in your case you'll create 3
> > > > stages.
> > > >
> > > > Barry
> > > >
> > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > >
> > > > > Thank you.
> > > > > But my code has 10 calls to KSPSolve of three
> > > > > different linear systems at each time update.
> > > > Should I
> > > > > strip it down to a single KSPSolve so that it
> > is
> > > > > easier to analysis? I might have the code dump
> > the
> > > > > Matrix and vector and write another code to
> > read
> > > > them
> > > > > into and call KSPSolve. I don't know whether
> > this
> > > > is
> > > > > worth doing or should I just send in the
> > messy
> > > > log
> > > > > file of the whole run.
> > > > > Thanks for any advice.
> > > > >
> > > > > Shi
> > > > >
> > > > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > >
> > > > > >
> > > > > > Shi,
> > > > > >
> > > > > > There is never a better test problem then
> > > > your
> > > > > > actual problem.
> > > > > > Send the results from running on 1, 4, and 8
> > > > > > processes with the options
> > > > > > -log_summary -ksp_view (use the optimized
> > > > version of
> > > > > > PETSc (running
> > > > > > config/configure.py --with-debugging=0))
> > > > > >
> > > > > > Barry
> > > > > >
> > > > > >
> > > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > > >
> > > > > > > Hi there,
> > > > > > >
> > > > > > > I am tuning our 3D FEM CFD code written
> > with
> > > > > > PETSc.
> > > > > > > The code doesn't scale very well. For
> > example,
> > > > > > with 8
> > > > > > > processes on a linux cluster, the speedup
> > we
> > > > > > achieve
> > > > > > > with a fairly large problem size(million
> > of
> > > > > > elements)
> > > > > > > is only 3 to 4 using the Congugate
> > gradient
> > > > > > solver. We
> > > > > > > can achieve a speed up of a 6.5 using a
> > GMRes
> > > > > > solver
> > > > > > > but the wall clock time of a GMRes is
> > longer
> > > > than
> > > > > > a CG
> > > > > > > solver which indicates that CG is the
> > faster
> > > > > > solver
> > > > > > > and it scales not as good as GMRes. Is
> > this
> > > > > > generally
> > > > > > > true?
> > > > > > >
> > > > > > > I then went to the examples and find a 2D
> > > > example
> > > > > > of
> > > > > > > KSPSolve (ex2.c). I let the code ran with
> > a
> > > > > > 1000x1000
> > > > > > > mesh and get a linear scaling of the CG
> > solver
> > > > and
> > > > > > a
> > > > > > > super linear scaling of the GMRes. These
> > are
> > > > both
> > > > > > much
> > > > > > > better than our code. However, I think the
> > 2D
> > > > > > nature
> > > > > > > of the sample problem might help the
> > scaling
> > > > of
> > > > > > the
> > > > > > > code. So I would like to try some 3D
> > example
> > > > using
> > > > > > the
> > > > > > > KSPSolve. Unfortunately, I couldn't find
> > such
> > > > an
> > > > > > > example either in the
> > > > > > src/ksp/ksp/examples/tutorials
> > > > > > > directory or by google search. There are a
> > > > couple
> > > > > > of
> > > > > > > 3D examples in the
> > > > src/ksp/ksp/examples/tutorials
> > > > > > but
> > > > > > > they are about the SNES not KSPSolve. If
> > > > anyone
> > > > > > can
> > > > > > > provide me with such an example, I would
> > > > really
> > > > > > > appreciate it.
> > > > > > > Thanks a lot.
> > > > > > >
> > > > > > > Shi
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> ____________________________________________________________________________________
> > > > > > > Finding fabulous fares is fun.
> > > > > > > Let Yahoo! FareChase search your favorite
> > > > travel
> > > > > > sites to find flight and hotel bargains.
> > > > > > >
> > > >
> > http://farechase.yahoo.com/promo-generic-14795097
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
> ____________________________________________________________________________________
> > > > > 8:00? 8:25? 8:40? Find a flick in no time
> > > > > with the Yahoo! Search movie showtime
> > shortcut.
> > > > > http://tools.search.yahoo.com/shortcuts/#news
> > > > >
> >
> === message truncated ===
>
>
>
>
> ____________________________________________________________________________________
> Finding fabulous fares is fun.
> Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
> http://farechase.yahoo.com/promo-generic-14795097
More information about the petsc-users
mailing list