[petsc-users] Number of levels of multigrid : 2-3 is sufficient ??
Matthew Knepley
knepley at gmail.com
Wed Oct 14 07:22:59 CDT 2015
On Tue, Oct 13, 2015 at 9:23 PM, Timothée Nicolas <
timothee.nicolas at gmail.com> wrote:
> Dear all,
>
> I have been playing around with multigrid recently, namely with
> /ksp/ksp/examples/tutorials/ex42.c, with /snes/examples/tutorial/ex5.c and
> with my own implementation of a laplacian type problem. In all cases, I
> have noted no improvement whatsoever in the performance, whether in CPU
> time or KSP iteration, by varying the number of levels of the multigrid
> solver. As an example, I have attached the log_summary for ex5.c with
> nlevels = 2 to 7, launched by
>
> mpiexec -n 1 ./ex5 -da_grid_x 21 -da_grid_y 21 -ksp_rtol 1.0e-9 -da_refine
> 6 -pc_type mg -pc_mg_levels # -snes_monitor -ksp_monitor -log_summary
>
> where -pc_mg_levels is set to a number between 2 and 7.
>
> So there is a noticeable CPU time improvement from 2 levels to 3 levels
> (30%), and then no improvement whatsoever. I am surprised because with 6
> levels of refinement of the DMDA the fine grid has more than 1200 points so
> with 3 levels the coarse grid still has more than 300 points which is still
> pretty large (I assume the ratio between grids is 2). I am wondering how
> the coarse solver efficiently solves the problem on the coarse grid with
> such a large number of points ? Given the principle of multigrid which is
> to erase the smooth part of the error with relaxation methods, which are
> usually efficient only for high frequency, I would expect optimal
> performance when the coarse grid is basically just a few points in each
> direction. Does anyone know why the performance saturates at low number of
> levels ? Basically what happens internally seems to be quite different from
> what I would expect...
>
A performance model that counts only flops is not sophisticated enough to
understand this effect. Unfortunately, nearly all MG
books/papers use this model. What we need is a model that incorporates
memory bandwidth (for pulling down the values), and
also maybe memory latency. For instance, your relaxation pulls down all the
values and makes a little progress. It does few flops,
but lots of memory access. An LU solve does a little memory access, many
more flops, but makes a lots more progress. If memory
access is more expensive, then we have a tradeoff, and can understand using
a coarse grid which is not just a few points.
Thanks,
Matt
> Best
>
> Timothee
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151014/9377f955/attachment-0001.html>
More information about the petsc-users
mailing list