[petsc-users] Number of levels of multigrid : 2-3 is sufficient ??

Timothée Nicolas timothee.nicolas at gmail.com
Wed Oct 14 07:34:04 CDT 2015


OK, I see. Does it mean that the coarse grid solver is by default set up
with the options -ksp_type preonly -pc_type lu ? What about the
multiprocessor case ?

Thx

Timothee

2015-10-14 21:22 GMT+09:00 Matthew Knepley <knepley at gmail.com>:

> On Tue, Oct 13, 2015 at 9:23 PM, Timothée Nicolas <
> timothee.nicolas at gmail.com> wrote:
>
>> Dear all,
>>
>> I have been playing around with multigrid recently, namely with
>> /ksp/ksp/examples/tutorials/ex42.c, with /snes/examples/tutorial/ex5.c and
>> with my own implementation of a laplacian type problem. In all cases, I
>> have noted no improvement whatsoever in the performance, whether in CPU
>> time or KSP iteration, by varying the number of levels of the multigrid
>> solver. As an example, I have attached the log_summary for ex5.c with
>> nlevels = 2 to 7, launched by
>>
>> mpiexec -n 1 ./ex5 -da_grid_x 21 -da_grid_y 21 -ksp_rtol 1.0e-9
>> -da_refine 6 -pc_type mg -pc_mg_levels # -snes_monitor -ksp_monitor
>> -log_summary
>>
>> where -pc_mg_levels is set to a number between 2 and 7.
>>
>> So there is a noticeable CPU time improvement from 2 levels to 3 levels
>> (30%), and then no improvement whatsoever. I am surprised because with 6
>> levels of refinement of the DMDA the fine grid has more than 1200 points so
>> with 3 levels the coarse grid still has more than 300 points which is still
>> pretty large (I assume the ratio between grids is 2). I am wondering how
>> the coarse solver efficiently solves the problem on the coarse grid with
>> such a large number of points ? Given the principle of multigrid which is
>> to erase the smooth part of the error with relaxation methods, which are
>> usually efficient only for high frequency, I would expect optimal
>> performance when the coarse grid is basically just a few points in each
>> direction. Does anyone know why the performance saturates at low number of
>> levels ? Basically what happens internally seems to be quite different from
>> what I would expect...
>>
>
> A performance model that counts only flops is not sophisticated enough to
> understand this effect. Unfortunately, nearly all MG
> books/papers use this model. What we need is a model that incorporates
> memory bandwidth (for pulling down the values), and
> also maybe memory latency. For instance, your relaxation pulls down all
> the values and makes a little progress. It does few flops,
> but lots of memory access. An LU solve does a little memory access, many
> more flops, but makes a lots more progress. If memory
> access is more expensive, then we have a tradeoff, and can understand
> using a coarse grid which is not just a few points.
>
>   Thanks,
>
>      Matt
>
>
>> Best
>>
>> Timothee
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151014/706f726a/attachment.html>


More information about the petsc-users mailing list