[petsc-users] Number of levels of multigrid : 2-3 is sufficient ??

Wed Oct 14 10:01:33 CDT 2015

On 14 October 2015 at 16:50, Matthew Knepley <knepley at gmail.com> wrote:

> On Wed, Oct 14, 2015 at 7:34 AM, Timothée Nicolas <
> timothee.nicolas at gmail.com> wrote:
>
>> OK, I see. Does it mean that the coarse grid solver is by default set up
>> with the options -ksp_type preonly -pc_type lu ? What about the
>> multiprocessor case ?
>>
>
> Small scale: We use redundant LU
>
> Large Scale: We use GAMG
>
>
Is your answer what "you" recommend, or what PETSc does by default?

Your answer gives the impression that PETSc makes a decision regarding the
choice of either redundant/LU or gamg based on something - e.g. the size of
the matrix, the number of cores (or some combination of the two).
Is that really what is happening inside PCMG?

>    Matt
>
>
>> Thx
>>
>> Timothee
>>
>> 2015-10-14 21:22 GMT+09:00 Matthew Knepley <knepley at gmail.com>:
>>
>>> On Tue, Oct 13, 2015 at 9:23 PM, Timothée Nicolas <
>>> timothee.nicolas at gmail.com> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I have been playing around with multigrid recently, namely with
>>>> /ksp/ksp/examples/tutorials/ex42.c, with /snes/examples/tutorial/ex5.c and
>>>> with my own implementation of a laplacian type problem. In all cases, I
>>>> have noted no improvement whatsoever in the performance, whether in CPU
>>>> time or KSP iteration, by varying the number of levels of the multigrid
>>>> solver. As an example, I have attached the log_summary for ex5.c with
>>>> nlevels = 2 to 7, launched by
>>>>
>>>> mpiexec -n 1 ./ex5 -da_grid_x 21 -da_grid_y 21 -ksp_rtol 1.0e-9
>>>> -da_refine 6 -pc_type mg -pc_mg_levels # -snes_monitor -ksp_monitor
>>>> -log_summary
>>>>
>>>> where -pc_mg_levels is set to a number between 2 and 7.
>>>>
>>>> So there is a noticeable CPU time improvement from 2 levels to 3 levels
>>>> (30%), and then no improvement whatsoever. I am surprised because with 6
>>>> levels of refinement of the DMDA the fine grid has more than 1200 points so
>>>> with 3 levels the coarse grid still has more than 300 points which is still
>>>> pretty large (I assume the ratio between grids is 2). I am wondering how
>>>> the coarse solver efficiently solves the problem on the coarse grid with
>>>> such a large number of points ? Given the principle of multigrid which is
>>>> to erase the smooth part of the error with relaxation methods, which are
>>>> usually efficient only for high frequency, I would expect optimal
>>>> performance when the coarse grid is basically just a few points in each
>>>> direction. Does anyone know why the performance saturates at low number of
>>>> levels ? Basically what happens internally seems to be quite different from
>>>> what I would expect...
>>>>
>>>
>>> A performance model that counts only flops is not sophisticated enough
>>> to understand this effect. Unfortunately, nearly all MG
>>> books/papers use this model. What we need is a model that incorporates
>>> memory bandwidth (for pulling down the values), and
>>> also maybe memory latency. For instance, your relaxation pulls down all
>>> the values and makes a little progress. It does few flops,
>>> but lots of memory access. An LU solve does a little memory access, many
>>> more flops, but makes a lots more progress. If memory
>>> access is more expensive, then we have a tradeoff, and can understand
>>> using a coarse grid which is not just a few points.
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> Best
>>>>
>>>> Timothee
>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151014/3d81bc1c/attachment.html>