[petsc-users] Poor multigrid convergence in parallel

Lawrence Mitchell lawrence.mitchell at imperial.ac.uk
Tue Jul 22 06:43:56 CDT 2014


On 21 Jul 2014, at 18:29, Jed Brown <jed at jedbrown.org> wrote:

> Lawrence Mitchell <lawrence.mitchell at imperial.ac.uk> writes:
> 
>> Below I show output from a run on 1 process and then two (along with ksp_view) for the following options:
>> 
>> -pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -mg_levels_pc_type sor -ksp_monitor
>> 
>> On 1 process:
>>  0 KSP Residual norm 5.865090856053e+02 
>>  1 KSP Residual norm 1.293159126247e+01 
>>  2 KSP Residual norm 5.181199296299e-01 
>>  3 KSP Residual norm 1.268870802643e-02 
>>  4 KSP Residual norm 5.116058930806e-04 
>>  5 KSP Residual norm 3.735036960550e-05 
>>  6 KSP Residual norm 1.755288530515e-06 
>> KSP Object: 1 MPI processes
>>  type: gmres
>>    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>    GMRES: happy breakdown tolerance 1e-30
>>  maximum iterations=6, initial guess is zero
>>  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>>  type: mg
>>    MG: type is MULTIPLICATIVE, levels=2 cycles=v
>>      Cycles per PCApply=1
>>      Not using Galerkin computed coarse grid matrices
> 
> How are you sure the rediscretized matrices are correct in parallel?

I computed the leading few (10 or so) largest and smallest eigenvalues of the operators on each level, which agree in serial and parallel, so I'm reasonably happy that I'm solving the same problem.

> I would stick with the redundant coarse solve and use
> 
>  -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -ksp_monitor_true_residual

Chebyshev + jacobi appears not to be an effective smoother at all (in serial and parallel).  For example, for a two-level cycle:

-pc_type mg  -ksp_rtol 1e-10 -ksp_max_it 2 -pc_mg_levels 2 -ksp_monitor_true_residual  -mg_levels_ksp_type chebyshev  -mg_levels_pc_type jacobi -mg_levels_ksp_max_it 3 -mg_levels_ksp_monitor_true_residual

    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 1.362115349221e+02 true resid norm 1.602648950381e+02 ||r(i)||/||b|| 5.718551585231e-01
    1 KSP none resid norm 3.635392745636e+01 true resid norm 8.483271949491e+01 ||r(i)||/||b|| 3.026990298979e-01
    2 KSP none resid norm 2.480718743635e+01 true resid norm 5.297234693113e+01 ||r(i)||/||b|| 1.890152540545e-01
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 8.050829954680e+00 true resid norm 2.187812818821e+01 ||r(i)||/||b|| 7.806525852269e-02
    1 KSP none resid norm 9.600041408511e+00 true resid norm 3.264655957783e+01 ||r(i)||/||b|| 1.164890383398e-01
    2 KSP none resid norm 2.246360204969e+01 true resid norm 7.338512212979e+01 ||r(i)||/||b|| 2.618518586918e-01
  0 KSP preconditioned resid norm 5.699274467568e+02 true resid norm 2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 3.268903240705e-01 true resid norm 7.956741597771e-01 ||r(i)||/||b|| 1.322008721235e+00
    1 KSP none resid norm 6.842996420984e-01 true resid norm 2.304432657016e+00 ||r(i)||/||b|| 3.828803578247e+00
    2 KSP none resid norm 1.941611552825e+00 true resid norm 6.611568865604e+00 ||r(i)||/||b|| 1.098508930317e+01
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 1.063939051808e+01 true resid norm 3.664397107500e+01 ||r(i)||/||b|| 6.088377855000e+01
    1 KSP none resid norm 3.311047978681e+01 true resid norm 1.157817230412e+02 ||r(i)||/||b|| 1.923707660218e+02
    2 KSP none resid norm 9.601250167498e+01 true resid norm 3.374117934071e+02 ||r(i)||/||b|| 5.606080429414e+02
  1 KSP preconditioned resid norm 5.693881869205e+02 true resid norm 2.803424438322e+02 ||r(i)||/||b|| 1.000314339708e+00
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 4.296484723989e+00 true resid norm 1.520508908495e+01 ||r(i)||/||b|| 2.232562919535e+00
    1 KSP none resid norm 1.365259124519e+01 true resid norm 4.847930361817e+01 ||r(i)||/||b|| 7.118215159284e+00
    2 KSP none resid norm 4.000954569964e+01 true resid norm 1.424890347852e+02 ||r(i)||/||b|| 2.092166206488e+01
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 2.315970399809e+02 true resid norm 8.307550866976e+02 ||r(i)||/||b|| 1.219797523982e+02
    1 KSP none resid norm 7.371848722680e+02 true resid norm 2.669017652749e+03 ||r(i)||/||b|| 3.918918073954e+02
    2 KSP none resid norm 2.174699749867e+03 true resid norm 7.893909006175e+03 ||r(i)||/||b|| 1.159062498016e+03
  2 KSP preconditioned resid norm 5.682205494231e+02 true resid norm 2.798247056626e+02 ||r(i)||/||b|| 9.984669529615e-01


> Use of Jacobi here is to make the smoother the same in parallel as
> serial.

If I run the above in parallel I get the same behaviour, (I guess as expected).

>  (Usually SOR is a bit stronger, though I think the Cheby/SOR
> combination is somewhat peculiar and usually overkill.)

So I have noticed that I only see this problem of poor convergence on some meshes/decompositions.  It also goes away if I apply more than one SOR sweep at each level.

For example:

-pc_type mg  -ksp_rtol 1e-10 -ksp_max_it 6      -pc_mg_levels 2 -ksp_monitor_true_residual  -mg_levels_ksp_type chebyshev  -mg_levels_pc_type sor  -mg_levels_pc_sor_omega 1 -mg_levels_pc_sor_its 2

produces on 1 process:
  0 KSP preconditioned resid norm 5.883693224294e+02 true resid norm 2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 1.309534073571e+01 true resid norm 6.584338590739e+00 ||r(i)||/||b|| 2.349415314990e-02
  2 KSP preconditioned resid norm 1.910365687382e-01 true resid norm 1.577734674543e-01 ||r(i)||/||b|| 5.629652783310e-04
  3 KSP preconditioned resid norm 3.277350963687e-03 true resid norm 4.094543394403e-03 ||r(i)||/||b|| 1.461009762200e-05
  4 KSP preconditioned resid norm 7.348080207899e-05 true resid norm 1.069159047866e-04 ||r(i)||/||b|| 3.814959705670e-07
  5 KSP preconditioned resid norm 2.830825689894e-06 true resid norm 4.561049005693e-06 ||r(i)||/||b|| 1.627467700623e-08
  6 KSP preconditioned resid norm 4.363171978244e-08 true resid norm 7.242601519032e-08 ||r(i)||/||b|| 2.584295855185e-10

and on 2:
  0 KSP preconditioned resid norm 5.836547633061e+02 true resid norm 2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 1.096256903154e+01 true resid norm 6.241073205209e+00 ||r(i)||/||b|| 2.226931797055e-02
  2 KSP preconditioned resid norm 1.417636317296e-01 true resid norm 2.236535290250e-01 ||r(i)||/||b|| 7.980376754651e-04
  3 KSP preconditioned resid norm 7.600523523911e-03 true resid norm 9.340981660469e-03 ||r(i)||/||b|| 3.333037186303e-05
  4 KSP preconditioned resid norm 2.109594208660e-04 true resid norm 3.284856206888e-04 ||r(i)||/||b|| 1.172098210571e-06
  5 KSP preconditioned resid norm 4.640884789807e-06 true resid norm 1.425912597007e-05 ||r(i)||/||b|| 5.087923178735e-08
  6 KSP preconditioned resid norm 2.250186110144e-07 true resid norm 5.621190025947e-07 ||r(i)||/||b|| 2.005745870056e-09


If I drop the number of sor its to 1, I reduce the residual by 10^8 in serial but 10^3 in parallel, but only on two processes, on 3 and more, I see reductions of around 10^8 as well.

> Compare convergence with and without -pc_mg_galerkin.

This was a little tricky, since I only have the action of the interpolation and restriction matrices.  I coded up a "default" MatMatMatMult using a shell that can compute the appropriate matrix-vector multiply.  Since I then don't have an explicit operator on each level, I restricted to a two-level method (where the fine grid operator is assembled).

I then run with:

-pc_type mg  -ksp_rtol 1e-10 -ksp_max_it 6 -pc_mg_levels 2 -ksp_monitor_true_residual  -mg_levels_ksp_type chebyshev  -mg_levels_pc_type sor  -mg_levels_pc_sor_omega 1 -mg_levels_pc_sor_its 1 -pc_mg_galerkin -mg_coarse_ksp_type cg -mg_coarse_pc_type none -mg_coarse_ksp_max_it 20

On one process I then have:

  0 KSP preconditioned resid norm 5.658166234044e+02 true resid norm 2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 4.380224616829e+00 true resid norm 5.942695957168e+00 ||r(i)||/||b|| 2.120465207202e-02
  2 KSP preconditioned resid norm 1.076966810659e-01 true resid norm 2.620990648687e-01 ||r(i)||/||b|| 9.352185471038e-04
  3 KSP preconditioned resid norm 1.067766492637e-03 true resid norm 6.007393822966e-03 ||r(i)||/||b|| 2.143550617324e-05
  4 KSP preconditioned resid norm 3.721153649133e-05 true resid norm 4.831899242645e-03 ||r(i)||/||b|| 1.724112137417e-05
  5 KSP preconditioned resid norm 9.011620703273e-07 true resid norm 4.821878295079e-03 ||r(i)||/||b|| 1.720536475662e-05
  6 KSP preconditioned resid norm 2.069209381617e-08 true resid norm 4.821642641207e-03 ||r(i)||/||b|| 1.720452389947e-05


On two:

  0 KSP preconditioned resid norm 5.662903120832e+02 true resid norm 2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 7.643743888525e+00 true resid norm 1.378172146897e+01 ||r(i)||/||b|| 4.917576312322e-02
  2 KSP preconditioned resid norm 6.976625171734e+00 true resid norm 1.228350334724e+01 ||r(i)||/||b|| 4.382984029151e-02
  3 KSP preconditioned resid norm 6.830161637428e+00 true resid norm 1.239530026533e+01 ||r(i)||/||b|| 4.422875263161e-02
  4 KSP preconditioned resid norm 1.043365955762e+00 true resid norm 7.550284766088e+00 ||r(i)||/||b|| 2.694082999761e-02
  5 KSP preconditioned resid norm 7.461488225071e-01 true resid norm 5.937639295074e+00 ||r(i)||/||b|| 2.118660895470e-02
  6 KSP preconditioned resid norm 7.370072731475e-01 true resid norm 5.668303344690e+00 ||r(i)||/||b|| 2.022556784482e-02


while on three I go back to decent convergence again:

  0 KSP preconditioned resid norm 5.692627905911e+02 true resid norm 2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 5.768424422846e+00 true resid norm 8.666039768542e+00 ||r(i)||/||b|| 3.092205279534e-02
  2 KSP preconditioned resid norm 1.308480914038e-01 true resid norm 1.498332207355e-01 ||r(i)||/||b|| 5.346329910577e-04
  3 KSP preconditioned resid norm 1.029363320021e-03 true resid norm 3.954753776706e-03 ||r(i)||/||b|| 1.411130208747e-05
  4 KSP preconditioned resid norm 2.495903055408e-05 true resid norm 6.469448501163e-04 ||r(i)||/||b|| 2.308420379466e-06
  5 KSP preconditioned resid norm 8.019626928214e-07 true resid norm 6.361612256862e-04 ||r(i)||/||b|| 2.269942388036e-06
  6 KSP preconditioned resid norm 1.669414544567e-08 true resid norm 6.360133846808e-04 ||r(i)||/||b|| 2.269414863642e-06

So I'm sort of none-the-wiser.  I'm a little bit at a loss as to why this occurs, but either switching to Richardson+SOR or Cheby/SOR with more that one SOR sweep appears to fix the problems, so I might just punt for now.

Cheers,

Lawrence
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140722/3a3400de/attachment-0001.pgp>


More information about the petsc-users mailing list