[petsc-users] negative grid complexity in GAMG

Tue Oct 15 23:17:33 CDT 2019

   I think I now see the bug: the code uses PetscInt       lev, nnz0 = -1; which will overflow. It should be using PetscLogDouble for nnz0

  You can try changing that one place in the code and see that it now prints a reasonable value for complexity. 

  I will prepare a MR for maint to fix the bug permanently.

  Barry

static PetscErrorCode PCMGGetGridComplexity(PC pc, PetscReal *gc)
{
  PetscErrorCode ierr;
  PC_MG          *mg      = (PC_MG*)pc->data;
  PC_MG_Levels   **mglevels = mg->levels;
  PetscInt       lev, nnz0 = -1;
  MatInfo        info;
  PetscFunctionBegin;
  if (!mg->nlevels) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_PLIB,"MG has no levels");
  for (lev=0, *gc=0; lev<mg->nlevels; lev++) {
    Mat dB;
    ierr = KSPGetOperators(mglevels[lev]->smoothd,NULL,&dB);CHKERRQ(ierr);
    ierr = MatGetInfo(dB,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr); /* global reduction */
    *gc += (PetscReal)info.nz_used;
    if (lev==mg->nlevels-1) nnz0 = info.nz_used;
  }
  if (nnz0) *gc /= (PetscReal)nnz0;
  else *gc = 0;
  PetscFunctionReturn(0);
}

> On Oct 15, 2019, at 11:11 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
> 
>   Mark,
> 
>   It may be caused by some overflow in the calculations somewhere due to your very large sizes and nonzeros but I could not see anything based on a quick inspection of the code. We seem to use double to store the counts which normally would be more than sufficient to hold the results without overflow. Unless somewhere there is a mistaken use of int that causes a problem.
> 
>   Just to confirm: You are running with with default double precision numbers and have used the configure option --with-64-bit-indices ? 
> 
>   I see you are using MATMFFD as the operator and MPIAIJ as the matrix from which to build the preconditioner? This is not suppose to cause any difficulties since the complexity computation code uses the second matrix, that is the MPAIJ matrix to get the complexity information. 
> 
>   There is definitely a bug but I am hard pressed to suggest how to find it since it seems only to be expressed in your giant runs. 
> 
>  Barry
> 
> 
> 
> 
> 
>> On Oct 15, 2019, at 9:16 PM, Mark Lohry via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> 
>> I'm running some larger unsteady problems and trying to eek out some better GAMG performance. As is, at very small time steps, ASM preconditioner with ILU(0) is maybe 20% more efficient than my naive GAMG setup, which gives me hope that some tuning of GAMG can give some advantage. Convergence overall seems quite good, and light years better than ASM/ILU at larger time steps.
>> 
>> So looking through the manual and see a note that "grid complexity should be well under 2.0 and preferably around 1.3 or lower". I check ksp_view and see:
>> Complexity:    grid = -40.5483
>> 
>> Is something funny happening here?
>> 
>> Pasting whole -ksp_view below:
>> 
>> KSP Object: 1920 MPI processes
>>  type: fgmres
>>    restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>    happy breakdown tolerance 1e-30
>>  maximum iterations=30, initial guess is zero
>>  tolerances:  relative=0.0001, absolute=1e-06, divergence=10.
>>  right preconditioning
>>  using UNPRECONDITIONED norm type for convergence test
>> PC Object: 1920 MPI processes
>>  type: gamg
>>    type is MULTIPLICATIVE, levels=20 cycles=v
>>      Cycles per PCApply=1
>>      Using externally compute Galerkin coarse grid matrices
>>      GAMG specific options
>>        Threshold for dropping small values in graph on each level =   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  
>>        Threshold scaling factor for each level not specified = 1.
>>        AGG specific options
>>          Symmetric graph false
>>          Number of levels to square graph 1
>>          Number smoothing steps 0
>>        Complexity:    grid = -40.5483
>>  Coarse grid solver -- level -------------------------------
>>    KSP Object: (mg_coarse_) 1920 MPI processes
>>      type: preonly
>>      maximum iterations=10000, initial guess is zero
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_coarse_) 1920 MPI processes
>>      type: bjacobi
>>        number of blocks = 1920
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_coarse_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=1, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_coarse_sub_) 1 MPI processes
>>        type: lu
>>          out-of-place factorization
>>          tolerance for zero pivot 2.22045e-14
>>          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>>          matrix ordering: nd
>>          factor fill ratio given 5., needed 1.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=15, cols=15, bs=5
>>                package used to perform factorization: petsc
>>                total: nonzeros=175, allocated nonzeros=175
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 3 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=15, cols=15, bs=5
>>          total: nonzeros=175, allocated nonzeros=175
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 3 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=15, cols=15, bs=5
>>        total: nonzeros=175, allocated nonzeros=175
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 3 nodes, limit used is 5
>>  Down solver (pre-smoother) on level 1 -------------------------------
>>    KSP Object: (mg_levels_1_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_1_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_1_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_1_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=4240, cols=4240
>>                package used to perform factorization: petsc
>>                total: nonzeros=64800, allocated nonzeros=64800
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 848 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=4240, cols=4240
>>          package used to perform factorization: petsc
>>          total: nonzeros=64800, allocated nonzeros=64800
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 848 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=4240, cols=4240, bs=5
>>        total: nonzeros=64800, allocated nonzeros=64800
>>        total number of mallocs used during MatSetValues calls =0
>>          using nonscalable MatPtAP() implementation
>>          using I-node (on process 0) routines: found 848 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 2 -------------------------------
>>    KSP Object: (mg_levels_2_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_2_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_2_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_2_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=4260, cols=4260
>>                package used to perform factorization: petsc
>>                total: nonzeros=65200, allocated nonzeros=65200
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 852 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=4260, cols=4260
>>          package used to perform factorization: petsc
>>          total: nonzeros=65200, allocated nonzeros=65200
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 852 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=4260, cols=4260, bs=5
>>        total: nonzeros=65200, allocated nonzeros=65200
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 852 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 3 -------------------------------
>>    KSP Object: (mg_levels_3_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_3_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_3_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_3_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=5440, cols=5440
>>                package used to perform factorization: petsc
>>                total: nonzeros=90950, allocated nonzeros=90950
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 1088 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=5440, cols=5440
>>          package used to perform factorization: petsc
>>          total: nonzeros=90950, allocated nonzeros=90950
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 1088 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=5440, cols=5440, bs=5
>>        total: nonzeros=90950, allocated nonzeros=90950
>>        total number of mallocs used during MatSetValues calls =0
>>          using nonscalable MatPtAP() implementation
>>          using I-node (on process 0) routines: found 1088 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 4 -------------------------------
>>    KSP Object: (mg_levels_4_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_4_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_4_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_4_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=5485, cols=5485
>>                package used to perform factorization: petsc
>>                total: nonzeros=93075, allocated nonzeros=93075
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 1097 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=5485, cols=5485
>>          package used to perform factorization: petsc
>>          total: nonzeros=93075, allocated nonzeros=93075
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 1097 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=5485, cols=5485, bs=5
>>        total: nonzeros=93075, allocated nonzeros=93075
>>        total number of mallocs used during MatSetValues calls =0
>>          using nonscalable MatPtAP() implementation
>>          using I-node (on process 0) routines: found 1097 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 5 -------------------------------
>>    KSP Object: (mg_levels_5_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_5_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_5_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_5_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=5685, cols=5685
>>                package used to perform factorization: petsc
>>                total: nonzeros=98925, allocated nonzeros=98925
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 1137 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=5685, cols=5685
>>          package used to perform factorization: petsc
>>          total: nonzeros=98925, allocated nonzeros=98925
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 1137 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=5685, cols=5685, bs=5
>>        total: nonzeros=98925, allocated nonzeros=98925
>>        total number of mallocs used during MatSetValues calls =0
>>          using nonscalable MatPtAP() implementation
>>          using I-node (on process 0) routines: found 1137 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 6 -------------------------------
>>    KSP Object: (mg_levels_6_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_6_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_6_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_6_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=5825, cols=5825
>>                package used to perform factorization: petsc
>>                total: nonzeros=102325, allocated nonzeros=102325
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 1165 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=5825, cols=5825
>>          package used to perform factorization: petsc
>>          total: nonzeros=102325, allocated nonzeros=102325
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 1165 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=5825, cols=5825, bs=5
>>        total: nonzeros=102325, allocated nonzeros=102325
>>        total number of mallocs used during MatSetValues calls =0
>>          using nonscalable MatPtAP() implementation
>>          using I-node (on process 0) routines: found 1165 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 7 -------------------------------
>>    KSP Object: (mg_levels_7_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_7_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_7_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_7_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=5925, cols=5925
>>                package used to perform factorization: petsc
>>                total: nonzeros=104925, allocated nonzeros=104925
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 1185 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=5925, cols=5925
>>          package used to perform factorization: petsc
>>          total: nonzeros=104925, allocated nonzeros=104925
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 1185 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=5925, cols=5925, bs=5
>>        total: nonzeros=104925, allocated nonzeros=104925
>>        total number of mallocs used during MatSetValues calls =0
>>          using nonscalable MatPtAP() implementation
>>          using I-node (on process 0) routines: found 1185 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 8 -------------------------------
>>    KSP Object: (mg_levels_8_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_8_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_8_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_8_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=6050, cols=6050
>>                package used to perform factorization: petsc
>>                total: nonzeros=110200, allocated nonzeros=110200
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 1210 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=6050, cols=6050
>>          package used to perform factorization: petsc
>>          total: nonzeros=110200, allocated nonzeros=110200
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 1210 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=6050, cols=6050, bs=5
>>        total: nonzeros=110200, allocated nonzeros=110200
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 1210 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 9 -------------------------------
>>    KSP Object: (mg_levels_9_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_9_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_9_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_9_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=6890, cols=6890
>>                package used to perform factorization: petsc
>>                total: nonzeros=153200, allocated nonzeros=153200
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 1378 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=6890, cols=6890
>>          package used to perform factorization: petsc
>>          total: nonzeros=153200, allocated nonzeros=153200
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 1378 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=6890, cols=6890, bs=5
>>        total: nonzeros=153200, allocated nonzeros=153200
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 1378 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 10 -------------------------------
>>    KSP Object: (mg_levels_10_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_10_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_10_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_10_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=7395, cols=7395
>>                package used to perform factorization: petsc
>>                total: nonzeros=180025, allocated nonzeros=180025
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 1479 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=7395, cols=7395
>>          package used to perform factorization: petsc
>>          total: nonzeros=180025, allocated nonzeros=180025
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 1479 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=7395, cols=7395, bs=5
>>        total: nonzeros=180025, allocated nonzeros=180025
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 1479 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 11 -------------------------------
>>    KSP Object: (mg_levels_11_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_11_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_11_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_11_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=8960, cols=8960
>>                package used to perform factorization: petsc
>>                total: nonzeros=259800, allocated nonzeros=259800
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 1792 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=8960, cols=8960
>>          package used to perform factorization: petsc
>>          total: nonzeros=259800, allocated nonzeros=259800
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 1792 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=8960, cols=8960, bs=5
>>        total: nonzeros=259800, allocated nonzeros=259800
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 1792 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 12 -------------------------------
>>    KSP Object: (mg_levels_12_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_12_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_12_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_12_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=1795, cols=1795
>>                package used to perform factorization: petsc
>>                total: nonzeros=33275, allocated nonzeros=33275
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 359 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=1795, cols=1795
>>          package used to perform factorization: petsc
>>          total: nonzeros=33275, allocated nonzeros=33275
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 359 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=11825, cols=11825, bs=5
>>        total: nonzeros=403125, allocated nonzeros=403125
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 359 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 13 -------------------------------
>>    KSP Object: (mg_levels_13_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_13_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_13_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_13_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=340, cols=340
>>                package used to perform factorization: petsc
>>                total: nonzeros=3500, allocated nonzeros=3500
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 68 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=340, cols=340
>>          package used to perform factorization: petsc
>>          total: nonzeros=3500, allocated nonzeros=3500
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 68 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=17210, cols=17210, bs=5
>>        total: nonzeros=696850, allocated nonzeros=696850
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 68 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 14 -------------------------------
>>    KSP Object: (mg_levels_14_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_14_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_14_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_14_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=125, cols=125
>>                package used to perform factorization: petsc
>>                total: nonzeros=625, allocated nonzeros=625
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 25 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=125, cols=125
>>          package used to perform factorization: petsc
>>          total: nonzeros=625, allocated nonzeros=625
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 25 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=29055, cols=29055, bs=5
>>        total: nonzeros=1475675, allocated nonzeros=1475675
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 25 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 15 -------------------------------
>>    KSP Object: (mg_levels_15_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_15_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_15_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_15_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=45, cols=45
>>                package used to perform factorization: petsc
>>                total: nonzeros=225, allocated nonzeros=225
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 9 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=45, cols=45
>>          package used to perform factorization: petsc
>>          total: nonzeros=225, allocated nonzeros=225
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 9 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=62935, cols=62935, bs=5
>>        total: nonzeros=3939025, allocated nonzeros=3939025
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 9 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 16 -------------------------------
>>    KSP Object: (mg_levels_16_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_16_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_16_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_16_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=55, cols=55
>>                package used to perform factorization: petsc
>>                total: nonzeros=725, allocated nonzeros=725
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 11 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=55, cols=55
>>          package used to perform factorization: petsc
>>          total: nonzeros=725, allocated nonzeros=725
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 11 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=205010, cols=205010, bs=5
>>        total: nonzeros=14780300, allocated nonzeros=14780300
>>        total number of mallocs used during MatSetValues calls =0
>>          using scalable MatPtAP() implementation
>>          using I-node (on process 0) routines: found 11 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 17 -------------------------------
>>    KSP Object: (mg_levels_17_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_17_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_17_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_17_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=360, cols=360
>>                package used to perform factorization: petsc
>>                total: nonzeros=14350, allocated nonzeros=14350
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 72 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=360, cols=360
>>          package used to perform factorization: petsc
>>          total: nonzeros=14350, allocated nonzeros=14350
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 72 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=921310, cols=921310, bs=5
>>        total: nonzeros=63203300, allocated nonzeros=63203300
>>        total number of mallocs used during MatSetValues calls =0
>>          using scalable MatPtAP() implementation
>>          using I-node (on process 0) routines: found 72 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 18 -------------------------------
>>    KSP Object: (mg_levels_18_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_18_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_18_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_18_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=2130, cols=2130
>>                package used to perform factorization: petsc
>>                total: nonzeros=87950, allocated nonzeros=87950
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 426 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=2130, cols=2130
>>          package used to perform factorization: petsc
>>          total: nonzeros=87950, allocated nonzeros=87950
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 426 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=4473930, cols=4473930, bs=5
>>        total: nonzeros=232427300, allocated nonzeros=232427300
>>        total number of mallocs used during MatSetValues calls =0
>>          using nonscalable MatPtAP() implementation
>>          using I-node (on process 0) routines: found 426 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 19 -------------------------------
>>    KSP Object: (mg_levels_19_) 1920 MPI processes
>>      type: richardson
>>        damping factor=1.
>>      maximum iterations=1, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (mg_levels_19_) 1920 MPI processes
>>      type: asm
>>        total subdomain blocks = 1920, amount of overlap = 0
>>        restriction/interpolation type - RESTRICT
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object: (mg_levels_19_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=10000, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (mg_levels_19_sub_) 1 MPI processes
>>        type: ilu
>>          in-place factorization
>>          0 levels of fill
>>          tolerance for zero pivot 2.22045e-14
>>          matrix ordering: natural
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: seqaij
>>                rows=179050, cols=179050
>>                package used to perform factorization: petsc
>>                total: nonzeros=42562500, allocated nonzeros=42562500
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 35810 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=179050, cols=179050
>>          package used to perform factorization: petsc
>>          total: nonzeros=42562500, allocated nonzeros=42562500
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 35810 nodes, limit used is 5
>>      linear system matrix followed by preconditioner matrix:
>>      Mat Object: 1920 MPI processes
>>        type: mffd
>>        rows=347149550, cols=347149550
>>          Matrix-free approximation:
>>            err=1.49012e-08 (relative error in function evaluation)
>>            Using wp compute h routine
>>                Does not compute normU
>>      Mat Object: 1920 MPI processes
>>        type: mpiaij
>>        rows=347149550, cols=347149550, bs=5
>>        total: nonzeros=86758607500, allocated nonzeros=86758607500
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 35810 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  linear system matrix followed by preconditioner matrix:
>>  Mat Object: 1920 MPI processes
>>    type: mffd
>>    rows=347149550, cols=347149550
>>      Matrix-free approximation:
>>        err=1.49012e-08 (relative error in function evaluation)
>>        Using wp compute h routine
>>            Does not compute normU
>>  Mat Object: 1920 MPI processes
>>    type: mpiaij
>>    rows=347149550, cols=347149550, bs=5
>>    total: nonzeros=86758607500, allocated nonzeros=86758607500
>>    total number of mallocs used during MatSetValues calls =0
>>      using I-node (on process 0) routines: found 35810 nodes, limit used is 5
>>        Line search: Using full step: fnorm 2.025875581923e+03 gnorm 2.801672254495e+00
>>    1 SNES Function norm 2.801672254495e+00
>