[petsc-users] Block preconditioning for 3d problem

Dave Lee davelee2804 at gmail.com
Mon Oct 14 20:18:12 CDT 2019


Hi Barry,

I've replied inline:

On Mon, Oct 14, 2019 at 4:07 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   Thanks. It seems that the iterative method is solved the entire linear
> system very accurately. For the last linear solve the initial true residual
> norm is around 10^6 (i.e. the norm of b) and the final one around 10^-10
>
>   The initial  true residual norm on each block (part of a slab on one
> process) is around 10^-1 so I don't see a huge difference in scaling
> between the blocks. The final true residual for blocks is around 10^-17 so
> each block is well solved.
>

Thank you for taking the time to look at this Barry. Glad to know there's
nothing too crazy going on here.


>
>    What do you mean by "quasi-Newton"? Are you doing any kind of line
> search or just using the full step from the linear solve?
>

I'm using a full step from the linear solve. By "quasi-Newton" I just mean
that the Jacobian is only approximate. I've left out and simplified some
terms in order to make the Schur complement more tractable


>
>    When you solve with the single KSP over all slabs the convergence is
> determined by ||r_all||/||b_all|| < tol while slab by slab it is
> ||r_slab||/||b_slab|| < tol for each slab. If the slabs had hugely
> different scalings this could result in different "answers" but since the
> slabs seem to have very similar scalings the "answers" should be very
> similar.
>
>    I would do the following. Solve the first linear system both ways and
> then compute the difference in the solution to the linear systems computed
> both ways. Relative to ||b|| the norm of the difference of the two linear
> systems solutions should be 10^-15 indicating both approaches produced
> almost identical answers. If this is the case then the problem is
> elsewhere, not in the solution of the linear system. On the other hand if
> the difference is large then likely the problem is in the formulation of
> the linear system, perhaps the combined linear system is not actually the
> individual linear systems.
>

Thanks for the tip Barry. I tried this, and the normalised difference
between the full solve and the slab-wise solve is approximately 1.0e-4 to
1.0e-3 (depending on the slab), so I guess the problem is in how I'm
assembling the linear system for the full solve.

I've had a look into this and haven't noticed anything obvious (but I will
keep digging). One difference between the full and slab-wise solves is that
for the full solve each processor has num_slabs * num_slab_local_dofs
DOFSs, whereas for the slab-wise solves each processor has just
num_slab_local_dofs, so I'm wondering if the striding of the
slab_local_dofs is causing issues for the full solve. I'm going to try
replacing the PCBJACOBI preconditioner with a PCFIELDSPLIT preconditioner,
and use PCFieldSplitSetIS() to specify dofs within a given slab for each
field, and see if that helps.


>
>   Barry
>
> By the way, using a block Jacobi with Jacobi as the preconditioner on each
> block  and a really tight tolerance for each block is not a practical
> method for larger problems, though I think it is ok for this test.
>

Thanks Barry. I just wanted to do something simple for this test, but I
take your point.

Cheers, Dave.

>
>
>
>
>
> > On Oct 13, 2019, at 6:20 AM, Dave Lee <davelee2804 at gmail.com> wrote:
> >
> > Hi Barry,
> >
> > I've answered your questions as follows:
> >
> >  So each slab lives on all the processes?
> >
> > Yes, that's correct.
> >
> > How many processes and how many slabs are you running with?
> >
> > For my test problem there are 6 processors and 8 slabs
> >
> > There will be num_slabs blocks per processor times the number of
> processors. Depending on the layout of the unknowns each block will consist
> of a entire local slap or each block will consist of pieces from each slap.
> >
> > The internal slab DOF is the minor index. Each processor is responsible
> for 144 DOFs per slab, so each processor is responsible for 8x144 DOFs.
> When I run with -ksp_view there are 8 blocks on each processor (48 total),
> each with 144 DOF's, so I'm assuming everything is ok there.
> >
> >  Please run the linear solver with
> >
> >     -ksp_monitor_true_residual -sub_ksp_monitor_true_residual
> -ksp_converged_reason -sub_ksp_converged_reason
> >
> >
> >   Do you get huge iteration counts somewhere?
> >
> > Not that I can see, file is attached (log.gmres)
> >
> > Next run the same thing but add -ksp_type fgmres does that change
> anything?
> >
> > I don't think so. The convergence is still the same, file is attached
> (log.fgmres).
> >
> > One thing I've noticed - I've been dumping the index of the slab with
> the maximum normalised error: norm_L2(delta_x) / norm_L2(x). If I solve for
> each slab separately, then the maximum normalised errors are typically in
> slab 3 (of 0-7). This makes sense as this is probably the most dynamically
> active layer of the atmosphere. However if I solve for all the slabs in a
> single solve (as is the cases for these tests), then the greatest error is
> always in slab 7, the top of the atmosphere. Obviously the pressure is
> lowest in this level, so I'm wondering if the GMRES iterations are just
> weighting for the lower layers of the atmosphere too much.
> >
> > Cheers, Dave.
> >
> > On Sun, Oct 13, 2019 at 3:42 AM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >
> > > On Oct 10, 2019, at 2:58 AM, Dave Lee via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> > >
> > > Hi PETSc,
> > >
> > > I have a nonlinear 3D problem for a set of uncoupled 2D slabs. (Which
> I ultimately want to couple once this problem is solved).
> > >
> > > When I solve the inner linear problem for each of these 2D slabs
> individually (using KSPGMRES) the convergence of the outer nonlinear
> problem is good. However when I solve the inner linear problem as a single
> 3D problem (with no coupling between the 2D slabs, so the matrix is
> effectively a set of uncoupled blocks, one for each 2D slab) the outer
> nonlinear convergence degrades dramatically.
> > >
> > > Note that I am not using SNES, just my own quasi-Newton approach for
> the outer nonlinear problem.
> > >
> > > I suspect that the way to recover the convergence for the 3D coupled
> problem is to use some sort of PCBJACOBI or PCFIELDSPLIT preconditioner,
> but I'm not entirely sure. I've tried following this example:
> > >
> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex7.c.html
> > > but without any improvement in the convergence.
> > >
> > > On each processor the slab index is the major index and the internal
> slab DOF index is the minor index. The parallel decomposition is within the
> 2D slab dimensions only, not between slabs.
> >
> >   So each slab lives on all the processes?
> >
> > >
> > > I'm configuring the 3D inner linear solver as
> >
> >   How many processes and how many slabs are you running with?
> >
> > >
> > > KSPGetPC(ksp, &pc);
> > > PCSetType(pc, PCBJACOBI);
> > > PCBJacobiSetLocalBlocks(pc, num_slabs, NULL);
> >
> >     There will be num_slabs blocks per processor times the number of
> processors. Depending on the layout of the unknowns each block will consist
> of a entire local slap or each block will consist of pieces from each slap.
> >
> > > KSPSetUp(ksp);
> > > PCBJacobiGetSubKSP(pc, &nlocal, &first_local, &subksp);
> > > for(int ii = 0; ii < nlocal; ii++) {
> > >     KSPGetPC(subksp[ii], &subpc);
> > >     PCSetType(subpc, PCJACOBI);
> > >     KSPSetType(subksp[ii], KSPGMRES);
> > >     KSPSetTolerances(subksp[ii], 1.e-12, PETSC_DEFAULT, PETSC_DEFAULT,
> PETSC_DEFAULT);
> > > }
> >
> >   Please run the linear solver with
> >
> >     -ksp_monitor_true_residual -sub_ksp_monitor_true_residual
> -ksp_converged_reason -sub_ksp_converged_reason
> >
> >
> >   Do you get huge iteration counts somewhere?
> >
> >   Next run the same thing but add -ksp_type fgmres does that change
> anything?
> >
> >   Send all the results in an attached file.
> >
> > Barry
> >
> >
> >
> > >
> > > However this does not seem to change the outer nonlinear convergence.
> When I run with ksp_view the local block sizes look correct - number of
> local 2D slab DOFs for each block on each processor.
> > >
> > > Any ideas on what sort of KSP/PC configuration I should be using to
> recover the convergence of the uncoupled 2D slab linear solve for this 3D
> linear solve? Should I be rethinking my node indexing to help with this?
> > >
> > > Cheers, Dave.
> >
> > <log.fgmres><log.gmres>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191015/c595f1ab/attachment-0001.html>


More information about the petsc-users mailing list