[petsc-users] Strange GAMG performance for mixed FE formulation

Fri Mar 4 01:05:50 CST 2016

Mark,

Using "-pc_gamg_square_graph 10" didn't change anything. I used values of
1, 10, 100, and 1000 and the performance seemed unaffected.

Changing the threshold of -pc_gamg_threshold to 0.8 did decrease wall-clock
time but it required more iterations.

I am not really sure how I go about tinkering around with GAMG or even ML.
Do you have a manual/reference/paper/etc that describes what's going on
within gamg?

Thanks,
Justin

On Thursday, March 3, 2016, Mark Adams <mfadams at lbl.gov> wrote:

> You have a very sparse 3D problem, with 9 non-zeros per row.   It is
> coarsening very slowly and creating huge coarse grids. which are expensive
> to construct.  The superlinear speedup is from cache effects, most likely.
> First try with:
>
> -pc_gamg_square_graph 10
>
> ML must have some AI in there to do this automatically, because gamg are
> pretty similar algorithmically.  There is a threshold parameter that is
> important (-pc_gamg_threshold <0.0>) and I think ML has the same default.
> ML is doing OK, but I would guess that if you use like 0.02 for MLs
> threshold you would see some improvement.
>
> Hypre is doing pretty bad also.  I suspect that it is getting confused as
> well.  I know less about how to deal with hypre.
>
> If you use -info and grep on GAMG you will see about 20 lines that will
> tell you the number of equations on level and the average number of
> non-zeros per row.  In 3D the reduction per level should be -- very
> approximately -- 30x and the number of non-zeros per row should not
> explode, but getting up to several hundred is OK.
>
> If you care to test this we should be able to get ML and GAMG to agree
> pretty well.  ML is a nice solver, but our core numerics should be about
> the same.  I tested this on a 3D elasticity problem a few years ago.  That
> said, I think your ML solve is pretty good.
>
> Mark
>
>
>
>
> On Thu, Mar 3, 2016 at 4:36 AM, Lawrence Mitchell <
> lawrence.mitchell at imperial.ac.uk
> <javascript:_e(%7B%7D,'cvml','lawrence.mitchell at imperial.ac.uk');>> wrote:
>
>> On 02/03/16 22:28, Justin Chang wrote:
>> ...
>>
>>
>> >         Down solver (pre-smoother) on level 3
>> >
>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_3_)
>> >             linear system matrix = precond matrix:
>> ...
>> >             Mat Object:             1 MPI processes
>> >
>> >               type: seqaij
>> >
>> >               rows=52147, cols=52147
>> >
>> >               total: nonzeros=38604909, allocated nonzeros=38604909
>> >
>> >               total number of mallocs used during MatSetValues calls =2
>> >
>> >                 not using I-node routines
>> >
>> >         Down solver (pre-smoother) on level 4
>> >
>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_4_)
>> >             linear system matrix followed by preconditioner matrix:
>> >
>> >             Mat Object:            (solver_fieldsplit_1_)
>>
>> ...
>> >
>> >             Mat Object:             1 MPI processes
>> >
>> >               type: seqaij
>> >
>> >               rows=384000, cols=384000
>> >
>> >               total: nonzeros=3416452, allocated nonzeros=3416452
>>
>>
>> This looks pretty suspicious to me.  The original matrix on the finest
>> level has 3.8e5 rows and ~3.4e6 nonzeros.  The next level up, the
>> coarsening produces 5.2e4 rows, but 38e6 nonzeros.
>>
>> FWIW, although Justin's PETSc is from Oct 2015, I get the same
>> behaviour with:
>>
>> ad5697c (Master as of 1st March).
>>
>> If I compare with the coarse operators that ML produces on the same
>> problem:
>>
>> The original matrix has, again:
>>
>>         Mat Object:         1 MPI processes
>>           type: seqaij
>>           rows=384000, cols=384000
>>           total: nonzeros=3416452, allocated nonzeros=3416452
>>           total number of mallocs used during MatSetValues calls=0
>>             not using I-node routines
>>
>> While the next finest level has:
>>
>>             Mat Object:             1 MPI processes
>>               type: seqaij
>>               rows=65258, cols=65258
>>               total: nonzeros=1318400, allocated nonzeros=1318400
>>               total number of mallocs used during MatSetValues calls=0
>>                 not using I-node routines
>>
>> So we have 6.5e4 rows and 1.3e6 nonzeros, which seems more plausible.
>>
>> Cheers,
>>
>> Lawrence
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160304/12defee8/attachment-0001.html>