[petsc-dev] Bad scaling of GAMG in FieldSplit
Pierre Jolivet
pierre.jolivet at enseeiht.fr
Thu Jul 26 15:08:28 CDT 2018
Slightly better results with a PCTELESCOPE, but still not scalable, cf. below. Maybe I’ll increase the telescope_reduction_factor.
I don’t have much problem with the current operator complexity on 512 or 2048 processes, but I do mind the MatMultAdd and MatMultTranspose being inefficient when the problem is distributed on the same communicator as the original Mat.
I tried playing around with the -pc_gamg_threshold option, but, I don’t know if it’s due to the fact that the shift is complex (which, by the by, makes BoomerAMG not an option), for values which have the same effect as -pc_gamg_threshold 0 (i.e, same coarsening has without the option set), I have a perfectly fine solver:
Linear st_fieldsplit_pressure_sub_1_telescope_ solve converged due to CONVERGED_RTOL iterations 8
and such… and for greater values (i.e., different coarsening), the solver goes wild:
Linear st_fieldsplit_pressure_sub_1_telescope_ solve did not converge due to DIVERGED_ITS iterations 10000
Thanks your help,
Pierre
Timings:
MatMultAdd 191310 0.0 1.9475e+03 0.0 6.78e+09 0.0 1.5e+09 2.0e+02 0.0e+00 6 0 4 0 0 6 0 4 0 0 3320
MatMultTranspose 191310 0.0 1.3959e+03 0.0 6.78e+09 0.0 1.5e+09 2.0e+02 0.0e+00 0 0 4 0 0 0 0 4 0 0 4632
[..]
KSPSolve_FS_3 6559 1.0 2.3480e+03 1.0 3.79e+1161.0 2.3e+10 1.1e+03 1.7e+05 16 11 60 18 21 16 11 60 18 22 153414
(Just as a reminder, here are the original timings:
MatMultAdd 222360 1.0 2.5904e+0348.0 4.31e+09 1.9 2.4e+09 1.3e+02 0.0e+00 14 0 4 0 0 14 0 4 0 0 2872
MatMultTranspose 222360 1.0 1.8736e+03421.8 4.31e+09 1.9 2.4e+09 1.3e+02 0.0e+00 0 0 4 0 0 0 0 4 0 0 3970
[..]
KSPSolve_FS_3 7412 1.0 2.8939e+03 1.0 2.66e+11 2.1 3.5e+10 6.1e+02 2.7e+05 17 11 67 14 28 17 11 67 14 28 148175
)
> On 26 Jul 2018, at 8:52 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Matthew Knepley <knepley at gmail.com> writes:
>
>> On Thu, Jul 26, 2018 at 2:43 PM Jed Brown <jed at jedbrown.org> wrote:
>>
>>> Matthew Knepley <knepley at gmail.com> writes:
>>>
>>>> On Thu, Jul 26, 2018 at 12:56 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 26, 2018 at 10:35 AM, Junchao Zhang <jczhang at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Jul 26, 2018 at 11:15 AM, Fande Kong <fdkong.jd at gmail.com>
>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 26, 2018 at 9:51 AM, Junchao Zhang <jczhang at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi, Pierre,
>>>>>>>> From your log_view files, I see you did strong scaling. You used 4X
>>>>>>>> more cores, but the execution time only dropped from 3.9143e+04
>>>>>>>> to 1.6910e+04.
>>>>>>>> From my previous analysis of a GAMG weak scaling test, it looks
>>>>>>>> communication is one of the reasons that caused poor scaling. In
>>> your
>>>>>>>> case, VecScatterEnd time was doubled from 1.5575e+03 to 3.2413e+03.
>>> Its
>>>>>>>> time percent jumped from 1% to 17%. This time can contribute to the
>>> big
>>>>>>>> time ratio in MatMultAdd ant MatMultTranspose, misleading you guys
>>> thinking
>>>>>>>> there was load-imbalance computation-wise.
>>>>>>>> The reason is that I found in the interpolation and restriction
>>>>>>>> phases of gamg, the communication pattern is very bad. Few processes
>>>>>>>> communicate with hundreds of neighbors with message sizes of a few
>>> bytes.
>>>>>>>>
>>>>>>>
>>>>>>> We may need to truncate interpolation/restriction operators. Also do
>>>>>>> some aggressive coarsening. Unfortunately, GAMG currently does not
>>> support.
>>>>>>>
>>>>>>
>>>>>> Are these gamg options the truncation you thought?
>>>>>>
>>>>>
>>>>>> -pc_gamg_threshold[] <thresh,default=0> - Before aggregating the graph
>>>>>> GAMG will remove small values from the graph on each level
>>>>>> -pc_gamg_threshold_scale <scale,default=1> - Scaling of threshold on
>>> each
>>>>>> coarser grid if not specified
>>>>>>
>>>>>
>>>>> Nope. Totally different things.
>>>>>
>>>>
>>>> Well, you could use _threshold to do more aggressive coarsening, but not
>>>> for thinning out
>>>> the interpolation.
>>>
>>> Increasing the threshold results in slower coarsening.
>>>
>>
>> Hmm, I think we have to change the webpage then:
>>
>>
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCGAMGSetThreshold.html
>>
>> I read it the opposite way.
>
> More coarse points is "better" (stronger), but higher complexity.
More information about the petsc-dev
mailing list