[petsc-users] negative grid complexity in GAMG

Mark Lohry mlohry at gmail.com
Thu Oct 17 10:20:15 CDT 2019


Ah, I'll probably have to do it in code then as we can't know the number of
levels until runtime unless I enforce it. Though for this fixed case I
guess I already know the number of levels created since I've run it...

I'll try it and hope for the best.

Thanks,
other Mark


On Thu, Oct 17, 2019 at 10:55 AM Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Oct 17, 2019 at 10:35 AM Mark Lohry <mlohry at gmail.com> wrote:
>
>> Sounds reasonable to me, I'll give it a shot and report back.
>>
>> Is there a command line way to do this? If I recall correctly,
>> -mg_coarse_... sets the options only on the coarsest grid. Is there a
>> -mg_every_level_but_the_finest_... type option, or do I need to manually
>> set each non-finest level programmatically?
>>
>
> We do not have that, so you are stuck with -mg_level_1_... We can put that
> in if this is too cumbersome. What
> I have found is that I only really do it for the next coarsest level
> instead of all but the finest.
>
>   Thanks,
>
>     Matt
>
>
>> On Thu, Oct 17, 2019 at 9:03 AM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Thu, Oct 17, 2019 at 8:07 AM Mark Lohry <mlohry at gmail.com> wrote:
>>>
>>>> So with many fewer levels, are you saying
>>>>>
>>>>>   a) It takes more iterates?
>>>>>
>>>>>   b) It takes the same wall clock time?
>>>>>
>>>>
>>>> Slightly more iterates but at roughly the same wall clock time. Only
>>>> did a short test but the runtime difference looks like it was in the noise.
>>>>
>>>>
>>>>
>>>>> I think you might want to switch to beefier smoothers on those lower
>>>>> levels if you see
>>>>> more iterates.
>>>>>
>>>>
>>>> I was thinking the same. I just did a quick run with 2 smoother
>>>> iterates per level instead of 1 and got maybe 20% performance benefit, so
>>>> I'll play with that a bit more. I figure ILU(0) is already a pretty beefy
>>>> smoother here especially because of the very large blocks; ILU(1) is out
>>>> because of memory consumption, unless I only do it on the coarsened levels.
>>>>
>>>
>>> I mean exactly this, only make the smoother stronger on coarse levels.
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> On much stiffer problems I saw considerable benefit from doing
>>>> gmres+ILU(0) for 5 iterations per level, so I'll give that a shot.
>>>>
>>>> On Thu, Oct 17, 2019 at 6:48 AM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Oct 17, 2019 at 6:22 AM Mark Lohry via petsc-users <
>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>
>>>>>> Hi Mark,
>>>>>>
>>>>>> I assume these are advection problems and smoothed aggregation does
>>>>>>> not work well.
>>>>>>>
>>>>>>
>>>>>> Correct, it stagnates immediately with smoothed aggregation.
>>>>>>
>>>>>>
>>>>>> I think '-pc_gamg_square_graph 20' should reduce the number of levels
>>>>>>> and work better for you.
>>>>>>>
>>>>>>
>>>>>> On the big problem it's producing 20 levels without
>>>>>> -pc_gamg_square_graph 20; with that on it produces 6 levels. It certainly
>>>>>> has less of the near-identical-size coarse levels, but overall convergence
>>>>>> time is roughly the same. Any suggestion of where to go from here?
>>>>>>
>>>>>
>>>>> So with many fewer levels, are you saying
>>>>>
>>>>>   a) It takes more iterates?
>>>>>
>>>>>   b) It takes the same wall clock time?
>>>>>
>>>>> I think you might want to switch to beefier smoothers on those lower
>>>>> levels if you see
>>>>> more iterates. Mark?
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>     Matt
>>>>>
>>>>>
>>>>>> Original setup without pc_gamg_square_graph 20:
>>>>>>
>>>>>> [0] PCSetUp_GAMG(): level 0) N=347149550, n data rows=5, n data
>>>>>> cols=5, nnz/row (ave)=250, np=1920
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 50. nnz ave. (N=69429910)
>>>>>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 894786 nodes
>>>>>> [0] PCSetUp_GAMG(): 1) N=4473930, n data cols=5, nnz/row (ave)=51,
>>>>>> 1920 active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 10.1761 nnz ave. (N=894786)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 184262 nodes
>>>>>> [0] PCSetUp_GAMG(): 2) N=921310, n data cols=5, nnz/row (ave)=68,
>>>>>> 1920 active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 13.0556 nnz ave. (N=184262)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 41002 nodes
>>>>>> [0] PCSetUp_GAMG(): 3) N=205010, n data cols=5, nnz/row (ave)=72,
>>>>>> 1920 active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 10.0909 nnz ave. (N=41002)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 12587 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 20 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 4) N=62935, n data cols=5, nnz/row (ave)=62, 960
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 5.33333 nnz ave. (N=12587)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 5811 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 40 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 5) N=29055, n data cols=5, nnz/row (ave)=50, 640
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 3.8 nnz ave. (N=5811)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 3442 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 110 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 6) N=17210, n data cols=5, nnz/row (ave)=40, 320
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 4.66176 nnz ave. (N=3442)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 2365 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 275 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 7) N=11825, n data cols=5, nnz/row (ave)=34, 240
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 4.961 nnz ave. (N=2365)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1792 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 1125 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 8) N=8960, n data cols=5, nnz/row (ave)=28, 192
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 5.79911 nnz ave. (N=1792)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1479 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 7395 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 9) N=7395, n data cols=5, nnz/row (ave)=24, 160
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 4.86883 nnz ave. (N=1479)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1378 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 6890 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 10) N=6890, n data cols=5, nnz/row (ave)=22, 128
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 4.44702 nnz ave. (N=1378)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1210 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 6050 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 11) N=6050, n data cols=5, nnz/row (ave)=18, 120
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 3.64298 nnz ave. (N=1210)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1185 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop:
>>>>>> new_size=120, neq(loc)=5925
>>>>>> [0] PCSetUp_GAMG(): 12) N=5925, n data cols=5, nnz/row (ave)=17, 120
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 3.54177 nnz ave. (N=1185)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1165 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop:
>>>>>> new_size=120, neq(loc)=5825
>>>>>> [0] PCSetUp_GAMG(): 13) N=5825, n data cols=5, nnz/row (ave)=17, 120
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 3.5133 nnz ave. (N=1165)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1137 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop:
>>>>>> new_size=120, neq(loc)=5685
>>>>>> [0] PCSetUp_GAMG(): 14) N=5685, n data cols=5, nnz/row (ave)=17, 120
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 3.48021 nnz ave. (N=1137)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1097 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop:
>>>>>> new_size=120, neq(loc)=5485
>>>>>> [0] PCSetUp_GAMG(): 15) N=5485, n data cols=5, nnz/row (ave)=16, 120
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 3.3938 nnz ave. (N=1097)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1088 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop:
>>>>>> new_size=120, neq(loc)=5440
>>>>>> [0] PCSetUp_GAMG(): 16) N=5440, n data cols=5, nnz/row (ave)=16, 120
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 3.34375 nnz ave. (N=1088)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 852 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 4260 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 17) N=4260, n data cols=5, nnz/row (ave)=15, 80
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 3.06103 nnz ave. (N=852)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 848 nodes
>>>>>> [0] PCSetUp_GAMG(): 18) N=4240, n data cols=5, nnz/row (ave)=15, 80
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 3.0566 nnz ave. (N=848)
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 3 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 15 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 19) N=15, n data cols=5, nnz/row (ave)=11, 1
>>>>>> active pes
>>>>>> [0] PCSetUp_GAMG(): 20 levels, grid complexity = 1.00367
>>>>>>
>>>>>> With pc_gamg_square_graph 20:
>>>>>>
>>>>>>
>>>>>> [0] PCSetUp_GAMG(): level 0) N=347149550, n data rows=5, n data
>>>>>> cols=5, nnz/row (ave)=250, np=1920
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 50. nnz ave. (N=69429910)
>>>>>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 20 to square
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 894786 nodes
>>>>>> [0] PCSetUp_GAMG(): 1) N=4473930, n data cols=5, nnz/row (ave)=51,
>>>>>> 1920 active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 10.1761 nnz ave. (N=894786)
>>>>>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 2 of 20 to square
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 49106 nodes
>>>>>> [0] PCSetUp_GAMG(): 2) N=245530, n data cols=5, nnz/row (ave)=80,
>>>>>> 1920 active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 14.8 nnz ave. (N=49106)
>>>>>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 3 of 20 to square
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1646 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 0 with simple
>>>>>> aggregation
>>>>>> [0] PCSetUp_GAMG(): 3) N=8230, n data cols=5, nnz/row (ave)=86, 160
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 11.5 nnz ave. (N=1646)
>>>>>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 4 of 20 to square
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 56 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 0 with simple
>>>>>> aggregation
>>>>>> [0] PCSetUp_GAMG(): 4) N=280, n data cols=5, nnz/row (ave)=62, 6
>>>>>> active pes
>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>> 0., 12.5714 nnz ave. (N=56)
>>>>>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 5 of 20 to square
>>>>>> [0] PCGAMGProlongator_AGG(): New grid 4 nodes
>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 20 with
>>>>>> simple aggregation
>>>>>> [0] PCSetUp_GAMG(): 5) N=20, n data cols=5, nnz/row (ave)=17, 1
>>>>>> active pes
>>>>>> [0] PCSetUp_GAMG(): 6 levels, grid complexity = 1.00291
>>>>>>
>>>>>> On Wed, Oct 16, 2019 at 9:46 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>
>>>>>>> The block size refers to the number of dofs/vertex, so you want 5.
>>>>>>> (I have no idea what is going on with block size set to 20).
>>>>>>>
>>>>>>> This is better but also smaller. 10 levels is a lot a levels.
>>>>>>>
>>>>>>> This is unsmoothed aggregation. I assume these are advection
>>>>>>> problems and smoothed aggregation does not work well. This is not in my
>>>>>>> wheelhouse. I think '-pc_gamg_square_graph 20' should reduce the number of
>>>>>>> levels and work better for you.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mark
>>>>>>>
>>>>>>> On Wed, Oct 16, 2019 at 8:59 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Mark, are you referring to how on the coarser levels the
>>>>>>>> coarsening rate seems to nearly flatline? i.e. level 2 has 4,260 rows while
>>>>>>>> level 1 has 4,240 rows? I was curious about that too...
>>>>>>>>
>>>>>>>> Not sure if this is the cause, but I have gone back and forth on
>>>>>>>> what blocksize I set; I'm doing high order elements with 5 coupled
>>>>>>>> equations, so the true block size in that case is 50x50. For that I had
>>>>>>>> played with setting block size to either 5 (number of equations) or 50
>>>>>>>> (actual block size) and seemed to have seen a meager 20% improvement with
>>>>>>>> the block size at 5, so I kind of left it there.
>>>>>>>>
>>>>>>>> Running a much smaller variant of the same problem at lower order
>>>>>>>> (block size 20 instead of 50), the -info grep you asked for is below. I'll
>>>>>>>> get -info for the much larger case but it'll take a couple days.
>>>>>>>>
>>>>>>>> For options I'm running
>>>>>>>> -snes_lag_jacobian 10000 -ksp_gmres_restart 100
>>>>>>>> -pc_gamg_agg_nsmooths 0 -mg_levels_ksp_type richardson -mg_levels_pc_type
>>>>>>>> asm -mg_levels_ksp_max_it 1
>>>>>>>> -pc_mg_cycle_type v -snes_linesearch_type bt -snes_linesearch_order
>>>>>>>> 3
>>>>>>>> -snes_linesearch_monitor -mg_levels_sub_pc_factor_in_place true
>>>>>>>> -info
>>>>>>>>
>>>>>>>>
>>>>>>>> block size 5 :
>>>>>>>>
>>>>>>>> [0] PCSetUp_GAMG(): level 0) N=2006480, n data rows=5, n data
>>>>>>>> cols=5, nnz/row (ave)=100, np=16
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 20. nnz ave. (N=401296)
>>>>>>>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 12947 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 1) N=64735, n data cols=5, nnz/row (ave)=51, 16
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 10.3351 nnz ave. (N=12947)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 2671 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 2) N=13355, n data cols=5, nnz/row (ave)=66, 16
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 12.5524 nnz ave. (N=2671)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 598 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 3) N=2990, n data cols=5, nnz/row (ave)=65, 16
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 12.7727 nnz ave. (N=598)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 178 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 4) N=890, n data cols=5, nnz/row (ave)=52, 16
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 8.28571 nnz ave. (N=178)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 80 nodes
>>>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 30 with
>>>>>>>> simple aggregation
>>>>>>>> [0] PCSetUp_GAMG(): 5) N=400, n data cols=5, nnz/row (ave)=34, 8
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 5.77778 nnz ave. (N=80)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 50 nodes
>>>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 30 with
>>>>>>>> simple aggregation
>>>>>>>> [0] PCSetUp_GAMG(): 6) N=250, n data cols=5, nnz/row (ave)=25, 4
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 4.76923 nnz ave. (N=50)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 36 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 7) N=180, n data cols=5, nnz/row (ave)=18, 4
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 3.75 nnz ave. (N=36)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 33 nodes
>>>>>>>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop:
>>>>>>>> new_size=4, neq(loc)=90
>>>>>>>> [0] PCSetUp_GAMG(): 8) N=165, n data cols=5, nnz/row (ave)=18, 4
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 3.72222 nnz ave. (N=33)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 8 nodes
>>>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 35 with
>>>>>>>> simple aggregation
>>>>>>>> [0] PCSetUp_GAMG(): 9) N=40, n data cols=5, nnz/row (ave)=15, 1
>>>>>>>> active pes
>>>>>>>> [0] PCSetUp_GAMG(): 10 levels, grid complexity = 1.02237
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> block size 20:
>>>>>>>>
>>>>>>>> [0] PCSetUp_GAMG(): level 0) N=2006480, n data rows=20, n data
>>>>>>>> cols=20, nnz/row (ave)=100, np=16
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 5. nnz ave. (N=100324)
>>>>>>>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 12948 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 1) N=258960, n data cols=20, nnz/row (ave)=205,
>>>>>>>> 16 active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 10.2857 nnz ave. (N=12948)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 2671 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 2) N=53420, n data cols=20, nnz/row (ave)=266,
>>>>>>>> 16 active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 12.5548 nnz ave. (N=2671)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 593 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 3) N=11860, n data cols=20, nnz/row (ave)=264,
>>>>>>>> 16 active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 10.8519 nnz ave. (N=593)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 181 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 4) N=3620, n data cols=20, nnz/row (ave)=214,
>>>>>>>> 16 active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 8.375 nnz ave. (N=181)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 79 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 5) N=1580, n data cols=20, nnz/row (ave)=164,
>>>>>>>> 16 active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 8. nnz ave. (N=79)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 43 nodes
>>>>>>>> [0] PCSetUp_GAMG(): 6) N=860, n data cols=20, nnz/row (ave)=100, 16
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 5. nnz ave. (N=43)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 15 nodes
>>>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 20 with
>>>>>>>> simple aggregation
>>>>>>>> [0] PCSetUp_GAMG(): 7) N=300, n data cols=20, nnz/row (ave)=81, 8
>>>>>>>> active pes
>>>>>>>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>>>>>>>> 0., 2.66667 nnz ave. (N=15)
>>>>>>>> [0] PCGAMGProlongator_AGG(): New grid 1 nodes
>>>>>>>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 0 with
>>>>>>>> simple aggregation
>>>>>>>> [0] PCSetUp_GAMG(): 8) N=20, n data cols=20, nnz/row (ave)=20, 1
>>>>>>>> active pes
>>>>>>>> [0] PCSetUp_GAMG(): HARD stop of coarsening on level 7.  Grid too
>>>>>>>> small: 1 block nodes
>>>>>>>> [0] PCSetUp_GAMG(): 9 levels, grid complexity = 1.35745
>>>>>>>>
>>>>>>>> On Wed, Oct 16, 2019 at 5:12 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>>
>>>>>>>>> Thanks Barry,
>>>>>>>>> Sorry I missed this.
>>>>>>>>> Mark: this problem is going crazy. The (default) coarsening
>>>>>>>>> parameters are terrible for you. Can run with -info, grep for GAMG and send
>>>>>>>>> that? And please send me the gamg parameters that you are using.
>>>>>>>>> Thanks,
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>> On Wed, Oct 16, 2019 at 9:01 AM Smith, Barry F. via petsc-users <
>>>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> barry/2019-10-15/bug-gamg-complexity/maint
>>>>>>>>>> https://gitlab.com/petsc/petsc/merge_requests/2179
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> > On Oct 16, 2019, at 5:29 AM, Mark Lohry <mlohry at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Well that was a quick late night bug fix. Thanks Barry, I'll
>>>>>>>>>> try it out.
>>>>>>>>>> >
>>>>>>>>>> > Just to confirm: You are running with with default double
>>>>>>>>>> precision numbers and have used the configure option --with-64-bit-indices ?
>>>>>>>>>> >
>>>>>>>>>> > Double precision floats, but 32 bit indices. I realize I'm
>>>>>>>>>> playing with fire here, but I'm bumping very close to available memory
>>>>>>>>>> limits at this scale and 64 bit indices tips me over. I figure integer
>>>>>>>>>> index overflows would probably show a catastrophic failure, but all output
>>>>>>>>>> looks sane.
>>>>>>>>>> >
>>>>>>>>>> > I see you are using MATMFFD as the operator and MPIAIJ as the
>>>>>>>>>> matrix from which to build the preconditioner? This is not suppose to cause
>>>>>>>>>> any difficulties since the complexity computation code uses the second
>>>>>>>>>> matrix, that is the MPAIJ matrix to get the complexity information.
>>>>>>>>>> >
>>>>>>>>>> > Right, I'm using MATMFFD for the operator, and using a
>>>>>>>>>> snes_lag_jacobian with SNESComputeJacobianDefaultColor for the matrix used
>>>>>>>>>> to build to preconditioner. The actual behavior is exactly what I'd expect
>>>>>>>>>> from smaller runs and the results look good, so it sounds like what you
>>>>>>>>>> describe.
>>>>>>>>>> >
>>>>>>>>>> > On Wed, Oct 16, 2019 at 12:17 AM Smith, Barry F. <
>>>>>>>>>> bsmith at mcs.anl.gov> wrote:
>>>>>>>>>> >
>>>>>>>>>> >    I think I now see the bug: the code uses PetscInt       lev,
>>>>>>>>>> nnz0 = -1; which will overflow. It should be using PetscLogDouble for nnz0
>>>>>>>>>> >
>>>>>>>>>> >   You can try changing that one place in the code and see that
>>>>>>>>>> it now prints a reasonable value for complexity.
>>>>>>>>>> >
>>>>>>>>>> >   I will prepare a MR for maint to fix the bug permanently.
>>>>>>>>>> >
>>>>>>>>>> >   Barry
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > static PetscErrorCode PCMGGetGridComplexity(PC pc, PetscReal
>>>>>>>>>> *gc)
>>>>>>>>>> > {
>>>>>>>>>> >   PetscErrorCode ierr;
>>>>>>>>>> >   PC_MG          *mg      = (PC_MG*)pc->data;
>>>>>>>>>> >   PC_MG_Levels   **mglevels = mg->levels;
>>>>>>>>>> >   PetscInt       lev, nnz0 = -1;
>>>>>>>>>> >   MatInfo        info;
>>>>>>>>>> >   PetscFunctionBegin;
>>>>>>>>>> >   if (!mg->nlevels) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_PLIB,"MG
>>>>>>>>>> has no levels");
>>>>>>>>>> >   for (lev=0, *gc=0; lev<mg->nlevels; lev++) {
>>>>>>>>>> >     Mat dB;
>>>>>>>>>> >     ierr =
>>>>>>>>>> KSPGetOperators(mglevels[lev]->smoothd,NULL,&dB);CHKERRQ(ierr);
>>>>>>>>>> >     ierr = MatGetInfo(dB,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr);
>>>>>>>>>> /* global reduction */
>>>>>>>>>> >     *gc += (PetscReal)info.nz_used;
>>>>>>>>>> >     if (lev==mg->nlevels-1) nnz0 = info.nz_used;
>>>>>>>>>> >   }
>>>>>>>>>> >   if (nnz0) *gc /= (PetscReal)nnz0;
>>>>>>>>>> >   else *gc = 0;
>>>>>>>>>> >   PetscFunctionReturn(0);
>>>>>>>>>> > }
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > > On Oct 15, 2019, at 11:11 PM, Smith, Barry F. <
>>>>>>>>>> bsmith at mcs.anl.gov> wrote:
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >   Mark,
>>>>>>>>>> > >
>>>>>>>>>> > >   It may be caused by some overflow in the calculations
>>>>>>>>>> somewhere due to your very large sizes and nonzeros but I could not see
>>>>>>>>>> anything based on a quick inspection of the code. We seem to use double to
>>>>>>>>>> store the counts which normally would be more than sufficient to hold the
>>>>>>>>>> results without overflow. Unless somewhere there is a mistaken use of int
>>>>>>>>>> that causes a problem.
>>>>>>>>>> > >
>>>>>>>>>> > >   Just to confirm: You are running with with default double
>>>>>>>>>> precision numbers and have used the configure option --with-64-bit-indices
>>>>>>>>>> ?
>>>>>>>>>> > >
>>>>>>>>>> > >   I see you are using MATMFFD as the operator and MPIAIJ as
>>>>>>>>>> the matrix from which to build the preconditioner? This is not suppose to
>>>>>>>>>> cause any difficulties since the complexity computation code uses the
>>>>>>>>>> second matrix, that is the MPAIJ matrix to get the complexity information.
>>>>>>>>>> > >
>>>>>>>>>> > >   There is definitely a bug but I am hard pressed to suggest
>>>>>>>>>> how to find it since it seems only to be expressed in your giant runs.
>>>>>>>>>> > >
>>>>>>>>>> > >  Barry
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >> On Oct 15, 2019, at 9:16 PM, Mark Lohry via petsc-users <
>>>>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>>>>> > >>
>>>>>>>>>> > >> I'm running some larger unsteady problems and trying to eek
>>>>>>>>>> out some better GAMG performance. As is, at very small time steps, ASM
>>>>>>>>>> preconditioner with ILU(0) is maybe 20% more efficient than my naive GAMG
>>>>>>>>>> setup, which gives me hope that some tuning of GAMG can give some
>>>>>>>>>> advantage. Convergence overall seems quite good, and light years better
>>>>>>>>>> than ASM/ILU at larger time steps.
>>>>>>>>>> > >>
>>>>>>>>>> > >> So looking through the manual and see a note that "grid
>>>>>>>>>> complexity should be well under 2.0 and preferably around 1.3 or lower". I
>>>>>>>>>> check ksp_view and see:
>>>>>>>>>> > >> Complexity:    grid = -40.5483
>>>>>>>>>> > >>
>>>>>>>>>> > >> Is something funny happening here?
>>>>>>>>>> > >>
>>>>>>>>>> > >> Pasting whole -ksp_view below:
>>>>>>>>>> > >>
>>>>>>>>>> > >> KSP Object: 1920 MPI processes
>>>>>>>>>> > >>  type: fgmres
>>>>>>>>>> > >>    restart=100, using Classical (unmodified) Gram-Schmidt
>>>>>>>>>> Orthogonalization with no iterative refinement
>>>>>>>>>> > >>    happy breakdown tolerance 1e-30
>>>>>>>>>> > >>  maximum iterations=30, initial guess is zero
>>>>>>>>>> > >>  tolerances:  relative=0.0001, absolute=1e-06, divergence=10.
>>>>>>>>>> > >>  right preconditioning
>>>>>>>>>> > >>  using UNPRECONDITIONED norm type for convergence test
>>>>>>>>>> > >> PC Object: 1920 MPI processes
>>>>>>>>>> > >>  type: gamg
>>>>>>>>>> > >>    type is MULTIPLICATIVE, levels=20 cycles=v
>>>>>>>>>> > >>      Cycles per PCApply=1
>>>>>>>>>> > >>      Using externally compute Galerkin coarse grid matrices
>>>>>>>>>> > >>      GAMG specific options
>>>>>>>>>> > >>        Threshold for dropping small values in graph on each
>>>>>>>>>> level =   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
>>>>>>>>>>  0.   0.   0.   0.   0.
>>>>>>>>>> > >>        Threshold scaling factor for each level not specified
>>>>>>>>>> = 1.
>>>>>>>>>> > >>        AGG specific options
>>>>>>>>>> > >>          Symmetric graph false
>>>>>>>>>> > >>          Number of levels to square graph 1
>>>>>>>>>> > >>          Number smoothing steps 0
>>>>>>>>>> > >>        Complexity:    grid = -40.5483
>>>>>>>>>> > >>  Coarse grid solver -- level -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_coarse_) 1920 MPI processes
>>>>>>>>>> > >>      type: preonly
>>>>>>>>>> > >>      maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_coarse_) 1920 MPI processes
>>>>>>>>>> > >>      type: bjacobi
>>>>>>>>>> > >>        number of blocks = 1920
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_coarse_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=1, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_coarse_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: lu
>>>>>>>>>> > >>          out-of-place factorization
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          using diagonal shift on blocks to prevent zero
>>>>>>>>>> pivot [INBLOCKS]
>>>>>>>>>> > >>          matrix ordering: nd
>>>>>>>>>> > >>          factor fill ratio given 5., needed 1.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=15, cols=15, bs=5
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=175, allocated nonzeros=175
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 3 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=15, cols=15, bs=5
>>>>>>>>>> > >>          total: nonzeros=175, allocated nonzeros=175
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 3 nodes, limit used
>>>>>>>>>> is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=15, cols=15, bs=5
>>>>>>>>>> > >>        total: nonzeros=175, allocated nonzeros=175
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 3
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 1
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_1_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_1_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_1_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_1_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=4240, cols=4240
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=64800, allocated
>>>>>>>>>> nonzeros=64800
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 848 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=4240, cols=4240
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=64800, allocated nonzeros=64800
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 848 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=4240, cols=4240, bs=5
>>>>>>>>>> > >>        total: nonzeros=64800, allocated nonzeros=64800
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using nonscalable MatPtAP() implementation
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 848
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 2
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_2_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_2_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_2_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_2_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=4260, cols=4260
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=65200, allocated
>>>>>>>>>> nonzeros=65200
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 852 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=4260, cols=4260
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=65200, allocated nonzeros=65200
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 852 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=4260, cols=4260, bs=5
>>>>>>>>>> > >>        total: nonzeros=65200, allocated nonzeros=65200
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 852
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 3
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_3_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_3_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_3_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_3_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=5440, cols=5440
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=90950, allocated
>>>>>>>>>> nonzeros=90950
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 1088 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=5440, cols=5440
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=90950, allocated nonzeros=90950
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 1088 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=5440, cols=5440, bs=5
>>>>>>>>>> > >>        total: nonzeros=90950, allocated nonzeros=90950
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using nonscalable MatPtAP() implementation
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 1088
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 4
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_4_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_4_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_4_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_4_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=5485, cols=5485
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=93075, allocated
>>>>>>>>>> nonzeros=93075
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 1097 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=5485, cols=5485
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=93075, allocated nonzeros=93075
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 1097 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=5485, cols=5485, bs=5
>>>>>>>>>> > >>        total: nonzeros=93075, allocated nonzeros=93075
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using nonscalable MatPtAP() implementation
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 1097
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 5
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_5_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_5_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_5_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_5_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=5685, cols=5685
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=98925, allocated
>>>>>>>>>> nonzeros=98925
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 1137 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=5685, cols=5685
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=98925, allocated nonzeros=98925
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 1137 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=5685, cols=5685, bs=5
>>>>>>>>>> > >>        total: nonzeros=98925, allocated nonzeros=98925
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using nonscalable MatPtAP() implementation
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 1137
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 6
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_6_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_6_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_6_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_6_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=5825, cols=5825
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=102325, allocated
>>>>>>>>>> nonzeros=102325
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 1165 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=5825, cols=5825
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=102325, allocated nonzeros=102325
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 1165 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=5825, cols=5825, bs=5
>>>>>>>>>> > >>        total: nonzeros=102325, allocated nonzeros=102325
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using nonscalable MatPtAP() implementation
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 1165
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 7
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_7_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_7_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_7_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_7_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=5925, cols=5925
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=104925, allocated
>>>>>>>>>> nonzeros=104925
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 1185 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=5925, cols=5925
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=104925, allocated nonzeros=104925
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 1185 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=5925, cols=5925, bs=5
>>>>>>>>>> > >>        total: nonzeros=104925, allocated nonzeros=104925
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using nonscalable MatPtAP() implementation
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 1185
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 8
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_8_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_8_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_8_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_8_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=6050, cols=6050
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=110200, allocated
>>>>>>>>>> nonzeros=110200
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 1210 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=6050, cols=6050
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=110200, allocated nonzeros=110200
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 1210 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=6050, cols=6050, bs=5
>>>>>>>>>> > >>        total: nonzeros=110200, allocated nonzeros=110200
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 1210
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 9
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_9_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_9_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_9_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_9_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=6890, cols=6890
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=153200, allocated
>>>>>>>>>> nonzeros=153200
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 1378 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=6890, cols=6890
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=153200, allocated nonzeros=153200
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 1378 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=6890, cols=6890, bs=5
>>>>>>>>>> > >>        total: nonzeros=153200, allocated nonzeros=153200
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 1378
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 10
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_10_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_10_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_10_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_10_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=7395, cols=7395
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=180025, allocated
>>>>>>>>>> nonzeros=180025
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 1479 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=7395, cols=7395
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=180025, allocated nonzeros=180025
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 1479 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=7395, cols=7395, bs=5
>>>>>>>>>> > >>        total: nonzeros=180025, allocated nonzeros=180025
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 1479
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 11
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_11_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_11_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_11_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_11_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=8960, cols=8960
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=259800, allocated
>>>>>>>>>> nonzeros=259800
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 1792 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=8960, cols=8960
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=259800, allocated nonzeros=259800
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 1792 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=8960, cols=8960, bs=5
>>>>>>>>>> > >>        total: nonzeros=259800, allocated nonzeros=259800
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 1792
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 12
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_12_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_12_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_12_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_12_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=1795, cols=1795
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=33275, allocated
>>>>>>>>>> nonzeros=33275
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 359 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=1795, cols=1795
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=33275, allocated nonzeros=33275
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 359 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=11825, cols=11825, bs=5
>>>>>>>>>> > >>        total: nonzeros=403125, allocated nonzeros=403125
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 359
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 13
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_13_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_13_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_13_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_13_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=340, cols=340
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=3500, allocated nonzeros=3500
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 68 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=340, cols=340
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=3500, allocated nonzeros=3500
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 68 nodes, limit used
>>>>>>>>>> is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=17210, cols=17210, bs=5
>>>>>>>>>> > >>        total: nonzeros=696850, allocated nonzeros=696850
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 68
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 14
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_14_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_14_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_14_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_14_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=125, cols=125
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=625, allocated nonzeros=625
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 25 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=125, cols=125
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=625, allocated nonzeros=625
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 25 nodes, limit used
>>>>>>>>>> is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=29055, cols=29055, bs=5
>>>>>>>>>> > >>        total: nonzeros=1475675, allocated nonzeros=1475675
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 25
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 15
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_15_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_15_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_15_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_15_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=45, cols=45
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=225, allocated nonzeros=225
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 9 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=45, cols=45
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=225, allocated nonzeros=225
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 9 nodes, limit used
>>>>>>>>>> is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=62935, cols=62935, bs=5
>>>>>>>>>> > >>        total: nonzeros=3939025, allocated nonzeros=3939025
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 9
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 16
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_16_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_16_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_16_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_16_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=55, cols=55
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=725, allocated nonzeros=725
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 11 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=55, cols=55
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=725, allocated nonzeros=725
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 11 nodes, limit used
>>>>>>>>>> is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=205010, cols=205010, bs=5
>>>>>>>>>> > >>        total: nonzeros=14780300, allocated nonzeros=14780300
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using scalable MatPtAP() implementation
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 11
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 17
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_17_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_17_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_17_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_17_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=360, cols=360
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=14350, allocated
>>>>>>>>>> nonzeros=14350
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 72 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=360, cols=360
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=14350, allocated nonzeros=14350
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 72 nodes, limit used
>>>>>>>>>> is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=921310, cols=921310, bs=5
>>>>>>>>>> > >>        total: nonzeros=63203300, allocated nonzeros=63203300
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using scalable MatPtAP() implementation
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 72
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 18
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_18_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_18_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_18_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_18_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=2130, cols=2130
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=87950, allocated
>>>>>>>>>> nonzeros=87950
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 426 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=2130, cols=2130
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=87950, allocated nonzeros=87950
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 426 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix = precond matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=4473930, cols=4473930, bs=5
>>>>>>>>>> > >>        total: nonzeros=232427300, allocated
>>>>>>>>>> nonzeros=232427300
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using nonscalable MatPtAP() implementation
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 426
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  Down solver (pre-smoother) on level 19
>>>>>>>>>> -------------------------------
>>>>>>>>>> > >>    KSP Object: (mg_levels_19_) 1920 MPI processes
>>>>>>>>>> > >>      type: richardson
>>>>>>>>>> > >>        damping factor=1.
>>>>>>>>>> > >>      maximum iterations=1, nonzero initial guess
>>>>>>>>>> > >>      tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>      left preconditioning
>>>>>>>>>> > >>      using NONE norm type for convergence test
>>>>>>>>>> > >>    PC Object: (mg_levels_19_) 1920 MPI processes
>>>>>>>>>> > >>      type: asm
>>>>>>>>>> > >>        total subdomain blocks = 1920, amount of overlap = 0
>>>>>>>>>> > >>        restriction/interpolation type - RESTRICT
>>>>>>>>>> > >>        Local solve is same for all blocks, in the following
>>>>>>>>>> KSP and PC objects:
>>>>>>>>>> > >>      KSP Object: (mg_levels_19_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: preonly
>>>>>>>>>> > >>        maximum iterations=10000, initial guess is zero
>>>>>>>>>> > >>        tolerances:  relative=1e-05, absolute=1e-50,
>>>>>>>>>> divergence=10000.
>>>>>>>>>> > >>        left preconditioning
>>>>>>>>>> > >>        using NONE norm type for convergence test
>>>>>>>>>> > >>      PC Object: (mg_levels_19_sub_) 1 MPI processes
>>>>>>>>>> > >>        type: ilu
>>>>>>>>>> > >>          in-place factorization
>>>>>>>>>> > >>          0 levels of fill
>>>>>>>>>> > >>          tolerance for zero pivot 2.22045e-14
>>>>>>>>>> > >>          matrix ordering: natural
>>>>>>>>>> > >>          factor fill ratio given 0., needed 0.
>>>>>>>>>> > >>            Factored matrix follows:
>>>>>>>>>> > >>              Mat Object: 1 MPI processes
>>>>>>>>>> > >>                type: seqaij
>>>>>>>>>> > >>                rows=179050, cols=179050
>>>>>>>>>> > >>                package used to perform factorization: petsc
>>>>>>>>>> > >>                total: nonzeros=42562500, allocated
>>>>>>>>>> nonzeros=42562500
>>>>>>>>>> > >>                total number of mallocs used during
>>>>>>>>>> MatSetValues calls =0
>>>>>>>>>> > >>                  using I-node routines: found 35810 nodes,
>>>>>>>>>> limit used is 5
>>>>>>>>>> > >>        linear system matrix = precond matrix:
>>>>>>>>>> > >>        Mat Object: 1 MPI processes
>>>>>>>>>> > >>          type: seqaij
>>>>>>>>>> > >>          rows=179050, cols=179050
>>>>>>>>>> > >>          package used to perform factorization: petsc
>>>>>>>>>> > >>          total: nonzeros=42562500, allocated
>>>>>>>>>> nonzeros=42562500
>>>>>>>>>> > >>          total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>            using I-node routines: found 35810 nodes, limit
>>>>>>>>>> used is 5
>>>>>>>>>> > >>      linear system matrix followed by preconditioner matrix:
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mffd
>>>>>>>>>> > >>        rows=347149550, cols=347149550
>>>>>>>>>> > >>          Matrix-free approximation:
>>>>>>>>>> > >>            err=1.49012e-08 (relative error in function
>>>>>>>>>> evaluation)
>>>>>>>>>> > >>            Using wp compute h routine
>>>>>>>>>> > >>                Does not compute normU
>>>>>>>>>> > >>      Mat Object: 1920 MPI processes
>>>>>>>>>> > >>        type: mpiaij
>>>>>>>>>> > >>        rows=347149550, cols=347149550, bs=5
>>>>>>>>>> > >>        total: nonzeros=86758607500, allocated
>>>>>>>>>> nonzeros=86758607500
>>>>>>>>>> > >>        total number of mallocs used during MatSetValues
>>>>>>>>>> calls =0
>>>>>>>>>> > >>          using I-node (on process 0) routines: found 35810
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>  Up solver (post-smoother) same as down solver (pre-smoother)
>>>>>>>>>> > >>  linear system matrix followed by preconditioner matrix:
>>>>>>>>>> > >>  Mat Object: 1920 MPI processes
>>>>>>>>>> > >>    type: mffd
>>>>>>>>>> > >>    rows=347149550, cols=347149550
>>>>>>>>>> > >>      Matrix-free approximation:
>>>>>>>>>> > >>        err=1.49012e-08 (relative error in function
>>>>>>>>>> evaluation)
>>>>>>>>>> > >>        Using wp compute h routine
>>>>>>>>>> > >>            Does not compute normU
>>>>>>>>>> > >>  Mat Object: 1920 MPI processes
>>>>>>>>>> > >>    type: mpiaij
>>>>>>>>>> > >>    rows=347149550, cols=347149550, bs=5
>>>>>>>>>> > >>    total: nonzeros=86758607500, allocated
>>>>>>>>>> nonzeros=86758607500
>>>>>>>>>> > >>    total number of mallocs used during MatSetValues calls =0
>>>>>>>>>> > >>      using I-node (on process 0) routines: found 35810
>>>>>>>>>> nodes, limit used is 5
>>>>>>>>>> > >>        Line search: Using full step: fnorm
>>>>>>>>>> 2.025875581923e+03 gnorm 2.801672254495e+00
>>>>>>>>>> > >>    1 SNES Function norm 2.801672254495e+00
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191017/0ea0573d/attachment-0001.html>


More information about the petsc-users mailing list