[petsc-users] neumann failure in my version of snes ex12

Sat Nov 23 17:44:33 CST 2013

On Sat, Nov 23, 2013 at 12:20 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Sat, Nov 23, 2013 at 2:04 PM, Geoffrey Irving <irving at naml.us> wrote:
>>
>> On Sat, Nov 23, 2013 at 10:11 AM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Sat, Nov 23, 2013 at 12:07 PM, Geoffrey Irving <irving at naml.us>
>> > wrote:
>> >>
>> >> On Sat, Nov 23, 2013 at 4:43 AM, Matthew Knepley <knepley at gmail.com>
>> >> wrote:
>> >> > On Fri, Nov 22, 2013 at 6:42 PM, Geoffrey Irving <irving at naml.us>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Nov 22, 2013 at 4:25 PM, Matthew Knepley <knepley at gmail.com>
>> >> >> wrote:
>> >> >> > On Fri, Nov 22, 2013 at 6:09 PM, Geoffrey Irving <irving at naml.us>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Fri, Nov 22, 2013 at 3:41 PM, Matthew Knepley
>> >> >> >> <knepley at gmail.com>
>> >> >> >> wrote:
>> >> >> >> > On Fri, Nov 22, 2013 at 5:36 PM, Geoffrey Irving
>> >> >> >> > <irving at naml.us>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> I have a duplicate of snes ex12 (FEM Poisson) which works with
>> >> >> >> >> Dirichlet boundary conditions, but it's breaking for me with
>> >> >> >> >> Neumann
>> >> >> >> >> conditions.  In particular, with Neumann conditions I get
>> >> >> >> >> results
>> >> >> >> >> which explode even though I believe I am setting a constant
>> >> >> >> >> nullspace.
>> >> >> >> >>
>> >> >> >> >> For example, if I use two first order elements (the unit
>> >> >> >> >> square
>> >> >> >> >> divided into two triangles), the resulting solution has
>> >> >> >> >>
>> >> >> >> >>     L2 error = 1.75514e+08
>> >> >> >> >>     u = [-175513825.75680602, -175513825.66302037,
>> >> >> >> >> -175513825.48390722, -175513824.84436429]
>> >> >> >> >>
>> >> >> >> >> This looks rather a lot like the null space isn't getting
>> >> >> >> >> through.
>> >> >> >> >> I
>> >> >> >> >> am creating the constant nullspace with
>> >> >> >> >>
>> >> >> >> >>       MatNullSpace null;
>> >> >> >> >>       CHECK(MatNullSpaceCreate(comm(),PETSC_TRUE,0,0,&null));
>> >> >> >> >>       CHECK(MatSetNullSpace(m,null));
>> >> >> >> >>       CHECK(MatNullSpaceDestroy(&null));
>> >> >> >> >>
>> >> >> >> >> If I pass "-ksp_view -mat_view", I get the following.  The
>> >> >> >> >> matrix
>> >> >> >> >> entries seem right (they do indeed have the constant
>> >> >> >> >> nullspace),
>> >> >> >> >> and
>> >> >> >> >> ksp_view shows that a nullspace is attached.  Is attaching the
>> >> >> >> >> nullspace to the matrix with MatSetNullSpace enough, or do I
>> >> >> >> >> need
>> >> >> >> >> to
>> >> >> >> >> additionally attach it to the KSP object?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 1) I always run with -ksp_monitor_true_residual now when
>> >> >> >> > debugging.
>> >> >> >> > This
>> >> >> >> > can
>> >> >> >> > give
>> >> >> >> >     you an idea whether you have a singular PC, which I suspect
>> >> >> >> > here.
>> >> >> >> >
>> >> >> >> > 2) Can you try using -pc_type jacobi? I think ILU might go
>> >> >> >> > crazy
>> >> >> >> > on a
>> >> >> >> > deficient matrix.
>> >> >> >>
>> >> >> >> Here are results with -ksp_monitor_true_residual -pc_type none:
>> >> >> >>
>> >> >> >>     http://naml.us/random/laplace-rtol.txt # with -ksp_rtol 1e-5
>> >> >> >>     http://naml.us/random/laplace-atol.txt # with -ksp_atol 1e-5
>> >> >> >
>> >> >> >
>> >> >> > Okay, if you have an inconsistent RHS I do not think that
>> >> >> > true_residual
>> >> >> > will work
>> >> >> > since it uses the unprojected b, but the solve should be fine.
>> >> >>
>> >> >> I still don't understand why the atol version is able to drift so
>> >> >> far
>> >> >> away from zero mean, even after tens of thousands of iterations.  If
>> >> >> KSP sees a null space on the matrix, shouldn't it project that null
>> >> >> space out of the *linear system* residual and also out of solution
>> >> >> on
>> >> >> each iteration?  Even if it is only projecting out of the solution
>> >> >> delta, how can null space errors be accumulating?
>> >> >
>> >> >
>> >> > Both the KSP and Mat show that the null space is set, so everything
>> >> > should
>> >> > work fine,
>> >> > and at this point its no longer DMPlex that is in control, its
>> >> > standard
>> >> > PETSc.
>> >> >
>> >> > We have reached the limit of usefu talking. Something is obviously
>> >> > wrong
>> >> > with the code,
>> >> > but since this routinely works in PETSc examples. In situations like
>> >> > these I think we need
>> >> > to follow the execution in the debugger to see what is wrong..You can
>> >> > look at Vec values
>> >> > in the debugger using
>> >> >
>> >> >   (gdb) p ((Vec_Seq*) b-.data)->array[0]@v->map.n
>> >> >
>> >> > and I look at DMPlex things with
>> >> >
>> >> >   (gdb) p ((DM_Plex*) dm->data)->coneSection
>> >> >
>> >> > etc.
>> >>
>> >> Thanks, I appreciate the help.  It looks like there were at least two
>> >> different problems:
>> >>
>> >> 1. The boundary FE I was creating had the same dimension as the
>> >> interior FE (instead of codimension 1), due to misreading ex12 even
>> >> though I had correctly refactored it.  I added a dimension consistency
>> >> check to my code, but I can do this in DMPlexComputeResidualFEM as
>> >> well to catch future user errors.
>> >>
>> >> 2. Even after fixing the dimensions, my boundary functions in PetscFEM
>> >> are getting x values both inside and completely outside the domain.
>> >> Almost certainly more user error, but hopefully also something I can
>> >> add a check for in petsc once I localize it.
>> >
>> > This could be my bug. The test I have for ex12 is the variable
>> > coefficient problem
>> > with div (x + y) grad u = f. This seems to check between the analytic
>> > and field
>> > versions, meaning that the x coming into f1() matches the x I used to
>> > make the
>> > field, and my exact solution.
>>
>> It does seem to happen with stock snes ex12:
>>
>>     branch: irving/assert-ex12-in-box1
>>
>>     % mpiexec -host localhost -n 1
>> /home/irving/petsc/debug/lib/ex12-obj/ex12 -run_type test
>> -refinement_limit 0.0    -bc_type neumann   -interpolate 1
>> -petscspace_order 1 -bd_petscspace_order 1 -show_initial
>> -dm_plex_print_fem 1 -dm_view ::ascii_info_detail
>>     ...
>>     [0]PETSC ERROR: evaluation at point outside unit box: 0 1.25
>>
>> I'll trace down why this is happening.
>
> My first guess is a triangle with backwards edge. This could cause the
> geometry routines to barf.

I don't think it's edge orientation: it breaks (though at different
points) regardless of whether I orient all the edges clockwise or
counterclockwise.  Also, I would expect bad edge orientation to result
in bad normals but not to produce bad quadrature locations (nor bad
residuals as long as the user routines don't depend on normal).

Specifically, I think the problem is a sign error in
DMPlexComputeProjection2Dto1D_Internal.  The following patch seems to
fix the out of bounds evaluation problem.
DMPlexComputeProjection2Dto1D is computing a matrix which maps from
the given segment to the canonical segment, and
DMPlexComputeLineGeometry_Internal expects a map from the canonical
segment to the given segment.

snes ex12 passes with and without this change, presumably because the
only Neumann test has constants along each box side, and is therefore
invariant to this error.

Unfortunately, my Laplace test is also invariant to this error, so
this bug is unrelated to the earlier problem.

Geoffrey

--------------------------------------------------

--- a/src/dm/impls/plex/plexgeometry.c
+++ b/src/dm/impls/plex/plexgeometry.c
@@ -213,8 +213,8 @@ static PetscErrorCode
DMPlexComputeProjection2Dto1D_Internal(PetscScalar coords[
   const PetscReal r = sqrt(x*x + y*y), c = x/r, s = y/r;

   PetscFunctionBegin;
-  R[0] =  c; R[1] = s;
-  R[2] = -s; R[3] = c;
+  R[0] = c; R[1] = -s;
+  R[2] = s; R[3] =  c;
   coords[0] = 0.0;
   coords[1] = r;