[petsc-dev] Bad scaling of GAMG in FieldSplit

Thu Jul 26 15:08:28 CDT 2018

Slightly better results with a PCTELESCOPE, but still not scalable, cf. below. Maybe I’ll increase the telescope_reduction_factor.

I don’t have much problem with the current operator complexity on 512 or 2048 processes, but I do mind the MatMultAdd and MatMultTranspose being inefficient when the problem is distributed on the same communicator as the original Mat.

I tried playing around with the -pc_gamg_threshold option, but, I don’t know if it’s due to the fact that the shift is complex (which, by the by, makes BoomerAMG not an option), for values which have the same effect as -pc_gamg_threshold 0 (i.e, same coarsening has without the option set), I have a perfectly fine solver:
        Linear st_fieldsplit_pressure_sub_1_telescope_ solve converged due to CONVERGED_RTOL iterations 8 
and such… and for greater values (i.e., different coarsening), the solver goes wild:
        Linear st_fieldsplit_pressure_sub_1_telescope_ solve did not converge due to DIVERGED_ITS iterations 10000

Thanks your help,
Pierre

Timings:
MatMultAdd        191310 0.0 1.9475e+03 0.0 6.78e+09 0.0 1.5e+09 2.0e+02 0.0e+00  6  0  4  0  0   6  0  4  0  0  3320
MatMultTranspose  191310 0.0 1.3959e+03 0.0 6.78e+09 0.0 1.5e+09 2.0e+02 0.0e+00  0  0  4  0  0   0  0  4  0  0  4632
[..]
KSPSolve_FS_3       6559 1.0 2.3480e+03 1.0 3.79e+1161.0 2.3e+10 1.1e+03 1.7e+05 16 11 60 18 21  16 11 60 18 22 153414

(Just as a reminder, here are the original timings:
MatMultAdd        222360 1.0 2.5904e+0348.0 4.31e+09 1.9 2.4e+09 1.3e+02 0.0e+00 14  0  4  0  0  14  0  4  0  0  2872
MatMultTranspose  222360 1.0 1.8736e+03421.8 4.31e+09 1.9 2.4e+09 1.3e+02 0.0e+00  0  0  4  0  0   0  0  4  0  0  3970
[..]
KSPSolve_FS_3       7412 1.0 2.8939e+03 1.0 2.66e+11 2.1 3.5e+10 6.1e+02 2.7e+05 17 11 67 14 28  17 11 67 14 28 148175
)

> On 26 Jul 2018, at 8:52 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> Matthew Knepley <knepley at gmail.com> writes:
> 
>> On Thu, Jul 26, 2018 at 2:43 PM Jed Brown <jed at jedbrown.org> wrote:
>> 
>>> Matthew Knepley <knepley at gmail.com> writes:
>>> 
>>>> On Thu, Jul 26, 2018 at 12:56 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Jul 26, 2018 at 10:35 AM, Junchao Zhang <jczhang at mcs.anl.gov>
>>>>> wrote:
>>>>> 
>>>>>> On Thu, Jul 26, 2018 at 11:15 AM, Fande Kong <fdkong.jd at gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Jul 26, 2018 at 9:51 AM, Junchao Zhang <jczhang at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi, Pierre,
>>>>>>>>  From your log_view files, I see you did strong scaling. You used 4X
>>>>>>>> more cores, but the execution time only dropped from 3.9143e+04
>>>>>>>> to 1.6910e+04.
>>>>>>>>  From my previous analysis of a GAMG weak scaling test, it looks
>>>>>>>> communication is one of the reasons that caused poor scaling.  In
>>> your
>>>>>>>> case,  VecScatterEnd time was doubled from 1.5575e+03 to 3.2413e+03.
>>> Its
>>>>>>>> time percent jumped from 1% to 17%. This time can contribute to the
>>> big
>>>>>>>> time ratio in MatMultAdd ant MatMultTranspose, misleading you guys
>>> thinking
>>>>>>>> there was load-imbalance computation-wise.
>>>>>>>>  The reason is that I found in the interpolation and restriction
>>>>>>>> phases of gamg, the communication pattern is very bad. Few processes
>>>>>>>> communicate with hundreds of neighbors with message sizes of a few
>>> bytes.
>>>>>>>> 
>>>>>>> 
>>>>>>> We may need to truncate interpolation/restriction operators. Also do
>>>>>>> some aggressive coarsening.  Unfortunately, GAMG currently does not
>>> support.
>>>>>>> 
>>>>>> 
>>>>>> Are these gamg options the truncation you thought?
>>>>>> 
>>>>> 
>>>>>> -pc_gamg_threshold[] <thresh,default=0> - Before aggregating the graph
>>>>>> GAMG will remove small values from the graph on each level
>>>>>> -pc_gamg_threshold_scale <scale,default=1> - Scaling of threshold on
>>> each
>>>>>> coarser grid if not specified
>>>>>> 
>>>>> 
>>>>> Nope.  Totally different things.
>>>>> 
>>>> 
>>>> Well, you could use _threshold to do more aggressive coarsening, but not
>>>> for thinning out
>>>> the interpolation.
>>> 
>>> Increasing the threshold results in slower coarsening.
>>> 
>> 
>> Hmm, I think we have to change the webpage then:
>> 
>> 
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCGAMGSetThreshold.html
>> 
>> I read it the opposite way.
> 
> More coarse points is "better" (stronger), but higher complexity.