[petsc-users] Strange GAMG performance for mixed FE formulation

Justin Chang jychang48 at gmail.com
Fri Mar 4 09:24:05 CST 2016


So with -pc_gamg_square_graph 10 I get the following:

[0] PCSetUp_*GAMG*(): level 0) N=48000, n data rows=1, n data cols=1,
nnz/row (ave)=9, np=1

[0] PC*GAMG*FilterGraph():  55.7114% nnz after filtering, with threshold
0., 8.79533 nnz ave. (N=48000)

[0] PC*GAMG*Coarsen_AGG(): Square Graph on level 1 of 1 to square

[0] PC*GAMG*Prolongator_AGG(): New grid 6672 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.954700e+00
min=1.040410e-02 PC=jacobi

[0] PCSetUp_*GAMG*(): 1) N=6672, n data cols=1, nnz/row (ave)=623, 1 active
pes

[0] PC*GAMG*FilterGraph():  3.40099% nnz after filtering, with threshold
0., 623.135 nnz ave. (N=6672)

[0] PC*GAMG*Prolongator_AGG(): New grid 724 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.313339e+00
min=2.474586e-02 PC=jacobi

[0] PCSetUp_*GAMG*(): 2) N=724, n data cols=1, nnz/row (ave)=724, 1 active
pes

[0] PC*GAMG*FilterGraph():  9.82914% nnz after filtering, with threshold
0., 724. nnz ave. (N=724)

[0] PC*GAMG*Prolongator_AGG(): New grid 37 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=2.011784e+00
min=2.759552e-01 PC=jacobi

[0] PCSetUp_*GAMG*(): 3) N=37, n data cols=1, nnz/row (ave)=37, 1 active pes

[0] PCSetUp_*GAMG*(): 4 levels, grid complexity = 12.0928

[0] PCSetUp_*GAMG*(): level 0) N=48000, n data rows=1, n data cols=1,
nnz/row (ave)=9, np=1

[0] PC*GAMG*FilterGraph():  55.7114% nnz after filtering, with threshold
0., 8.79533 nnz ave. (N=48000)

[0] PC*GAMG*Coarsen_AGG(): Square Graph on level 1 of 1 to square

[0] PC*GAMG*Prolongator_AGG(): New grid 6672 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.954700e+00
min=1.040410e-02 PC=jacobi

[0] PCSetUp_*GAMG*(): 1) N=6672, n data cols=1, nnz/row (ave)=623, 1 active
pes

[0] PC*GAMG*FilterGraph():  3.40099% nnz after filtering, with threshold
0., 623.135 nnz ave. (N=6672)

[0] PC*GAMG*Prolongator_AGG(): New grid 724 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.313339e+00
min=2.474586e-02 PC=jacobi

[0] PCSetUp_*GAMG*(): 2) N=724, n data cols=1, nnz/row (ave)=724, 1 active
pes

[0] PC*GAMG*FilterGraph():  9.82914% nnz after filtering, with threshold
0., 724. nnz ave. (N=724)

[0] PC*GAMG*Prolongator_AGG(): New grid 37 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=2.011784e+00
min=2.759552e-01 PC=jacobi

[0] PCSetUp_*GAMG*(): 3) N=37, n data cols=1, nnz/row (ave)=37, 1 active pes

[0] PCSetUp_*GAMG*(): 4 levels, grid complexity = 12.0928

[0] PCSetUp_*GAMG*(): level 0) N=162000, n data rows=1, n data cols=1,
nnz/row (ave)=9, np=1

[0] PC*GAMG*FilterGraph():  55.6621% nnz after filtering, with threshold
0., 8.863 nnz ave. (N=162000)

[0] PC*GAMG*Coarsen_AGG(): Square Graph on level 1 of 1 to square

[0] PC*GAMG*Prolongator_AGG(): New grid 22085 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.955376e+00
min=8.260696e-03 PC=jacobi

[0] PCSetUp_*GAMG*(): 1) N=22085, n data cols=1, nnz/row (ave)=704, 1
active pes

[0] PC*GAMG*FilterGraph():  3.1314% nnz after filtering, with threshold 0.,
704.128 nnz ave. (N=22085)

[0] PC*GAMG*Prolongator_AGG(): New grid 2283 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.311291e+00
min=1.484874e-02 PC=jacobi

[0] PCSetUp_*GAMG*(): 2) N=2283, n data cols=1, nnz/row (ave)=2283, 1
active pes

[0] PC*GAMG*FilterGraph():  3.64497% nnz after filtering, with threshold
0., 2283. nnz ave. (N=2283)

[0] PC*GAMG*Prolongator_AGG(): New grid 97 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=2.043254e+00
min=1.321528e-01 PC=jacobi

[0] PCSetUp_*GAMG*(): 3) N=97, n data cols=1, nnz/row (ave)=97, 1 active pes

[0] PC*GAMG*FilterGraph():  66.8403% nnz after filtering, with threshold
0., 97. nnz ave. (N=97)

[0] PC*GAMG*Prolongator_AGG(): New grid 5 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.653762e+00
min=4.460582e-01 PC=jacobi

[0] PCSetUp_*GAMG*(): 4) N=5, n data cols=1, nnz/row (ave)=5, 1 active pes

[0] PCSetUp_*GAMG*(): 5 levels, grid complexity = 15.4673

Btw i did a smaller problem. Unit cube of 30x30x30 not 40x40x40.

I used those smaller threshold values you mentioned but nothing really
changed

Thanks,
Justin

On Fri, Mar 4, 2016 at 7:40 AM, Mark Adams <mfadams at lbl.gov> wrote:

>
>
> On Fri, Mar 4, 2016 at 2:05 AM, Justin Chang <jychang48 at gmail.com> wrote:
>
>> Mark,
>>
>> Using "-pc_gamg_square_graph 10" didn't change anything. I used values of
>> 1, 10, 100, and 1000 and the performance seemed unaffected.
>>
>
> Humm. please run with -info and grep on GAMG,  This will be about 20 lines
> but -info is very noisy.  Perhaps if you can do the same thing for ML (I'm
> not sure if the ML interface supports verbose output like this) that would
> be useful.
>
> BTW, -pc_gamg_square_graph N is the number of levels to square the graph.
> You will see in the verbose output the number of non-zeros per row in your
> problems starts at 9 and goes up to ~740, and ~5370, and then 207 the the
> coarse grid where there are 207 columns.
>
>
>>
>> Changing the threshold of -pc_gamg_threshold to 0.8 did decrease
>> wall-clock time but it required more iterations.
>>
>
> That is very large.  A more reasonable scan would be: 0, 0.01, 0.04, 0.08.
>
>
>>
>> I am not really sure how I go about tinkering around with GAMG or even
>> ML. Do you have a manual/reference/paper/etc that describes what's going on
>> within gamg?
>>
>
> There is a section in the manual.  It goes through some of these
> troubleshooting techniques.
>
>
>>
>> Thanks,
>> Justin
>>
>>
>> On Thursday, March 3, 2016, Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> You have a very sparse 3D problem, with 9 non-zeros per row.   It is
>>> coarsening very slowly and creating huge coarse grids. which are expensive
>>> to construct.  The superlinear speedup is from cache effects, most likely.
>>> First try with:
>>>
>>> -pc_gamg_square_graph 10
>>>
>>> ML must have some AI in there to do this automatically, because gamg are
>>> pretty similar algorithmically.  There is a threshold parameter that is
>>> important (-pc_gamg_threshold <0.0>) and I think ML has the same default.
>>> ML is doing OK, but I would guess that if you use like 0.02 for MLs
>>> threshold you would see some improvement.
>>>
>>> Hypre is doing pretty bad also.  I suspect that it is getting confused
>>> as well.  I know less about how to deal with hypre.
>>>
>>> If you use -info and grep on GAMG you will see about 20 lines that will
>>> tell you the number of equations on level and the average number of
>>> non-zeros per row.  In 3D the reduction per level should be -- very
>>> approximately -- 30x and the number of non-zeros per row should not
>>> explode, but getting up to several hundred is OK.
>>>
>>> If you care to test this we should be able to get ML and GAMG to agree
>>> pretty well.  ML is a nice solver, but our core numerics should be about
>>> the same.  I tested this on a 3D elasticity problem a few years ago.  That
>>> said, I think your ML solve is pretty good.
>>>
>>> Mark
>>>
>>>
>>>
>>>
>>> On Thu, Mar 3, 2016 at 4:36 AM, Lawrence Mitchell <
>>> lawrence.mitchell at imperial.ac.uk> wrote:
>>>
>>>> On 02/03/16 22:28, Justin Chang wrote:
>>>> ...
>>>>
>>>>
>>>> >         Down solver (pre-smoother) on level 3
>>>> >
>>>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_3_)
>>>> >             linear system matrix = precond matrix:
>>>> ...
>>>> >             Mat Object:             1 MPI processes
>>>> >
>>>> >               type: seqaij
>>>> >
>>>> >               rows=52147, cols=52147
>>>> >
>>>> >               total: nonzeros=38604909, allocated nonzeros=38604909
>>>> >
>>>> >               total number of mallocs used during MatSetValues calls
>>>> =2
>>>> >
>>>> >                 not using I-node routines
>>>> >
>>>> >         Down solver (pre-smoother) on level 4
>>>> >
>>>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_4_)
>>>> >             linear system matrix followed by preconditioner matrix:
>>>> >
>>>> >             Mat Object:            (solver_fieldsplit_1_)
>>>>
>>>> ...
>>>> >
>>>> >             Mat Object:             1 MPI processes
>>>> >
>>>> >               type: seqaij
>>>> >
>>>> >               rows=384000, cols=384000
>>>> >
>>>> >               total: nonzeros=3416452, allocated nonzeros=3416452
>>>>
>>>>
>>>> This looks pretty suspicious to me.  The original matrix on the finest
>>>> level has 3.8e5 rows and ~3.4e6 nonzeros.  The next level up, the
>>>> coarsening produces 5.2e4 rows, but 38e6 nonzeros.
>>>>
>>>> FWIW, although Justin's PETSc is from Oct 2015, I get the same
>>>> behaviour with:
>>>>
>>>> ad5697c (Master as of 1st March).
>>>>
>>>> If I compare with the coarse operators that ML produces on the same
>>>> problem:
>>>>
>>>> The original matrix has, again:
>>>>
>>>>         Mat Object:         1 MPI processes
>>>>           type: seqaij
>>>>           rows=384000, cols=384000
>>>>           total: nonzeros=3416452, allocated nonzeros=3416452
>>>>           total number of mallocs used during MatSetValues calls=0
>>>>             not using I-node routines
>>>>
>>>> While the next finest level has:
>>>>
>>>>             Mat Object:             1 MPI processes
>>>>               type: seqaij
>>>>               rows=65258, cols=65258
>>>>               total: nonzeros=1318400, allocated nonzeros=1318400
>>>>               total number of mallocs used during MatSetValues calls=0
>>>>                 not using I-node routines
>>>>
>>>> So we have 6.5e4 rows and 1.3e6 nonzeros, which seems more plausible.
>>>>
>>>> Cheers,
>>>>
>>>> Lawrence
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160304/ff852bbe/attachment-0001.html>


More information about the petsc-users mailing list