[petsc-users] Preconditioning systems of equations with complex numbers

Fri Feb 8 16:56:57 CST 2019

So I used -mat_view draw -draw_pause -1 on my medium sized matrix and got
this output:

[image: 1MPI.png]

So it seems there are lots of off-diagonal terms, and that a decomposition
of the problem via matload would give a terrible unbalanced problem.

Given the initial A and b Mat/Vec, I experimented with MatPartioning and
inserted the following lines into my code:

Mat               Apart;
Vec               bpart;
MatPartitioning   part;
IS                is,isrows;
ierr = MatPartitioningCreate(PETSC_COMM_WORLD, &part);CHKERRQ(ierr);
ierr = MatPartitioningSetAdjacency(part, A);CHKERRQ(ierr);
ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr);
ierr = MatPartitioningApply(part, &is);CHKERRQ(ierr);
ierr = ISBuildTwoSided(is,NULL,&isrows);CHKERRQ(ierr);
ierr = MatCreateSubMatrix(A, isrows,isrows, MAT_INITIAL_MATRIX,
&Apart);CHKERRQ(ierr);
ierr = MatSetOptionsPrefix(Apart, "part_");CHKERRQ(ierr);
ierr = MatSetFromOptions(Apart);CHKERRQ(ierr);
ierr = VecGetSubVector(b,isrows,&bpart);CHKERRQ(ierr);

/* Set Apart and bpart in the KSPSolve */
...

And here are the mat_draw figures from 2 and 4 MPI processes respectively:

[image: 2MPI.png][image: 4MPI.png]

Is this "right"? It just feels like I'm just duplicating the nnz structure
among all the MPI processes. And it didn't really improve the performance
of ASM.

Also, this nnz structure appears to depend on the distribution system we
cook up in OpenDSS, for example the largest matrix we have thus far looks
like this:
[image: larger.png]
Is there a better alternative to the partitioning I implemented above? Or
did I do something catastrophically wrong

Thanks,
Justin

Side node - Yes I tried KLU, but it needed several KSP iterations to obtain
the solution. Here's the ksp monitor/view outputs:

0 KSP preconditioned resid norm 1.705434112839e+06 true resid norm
2.242813827253e+12 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 1.122965789284e+05 true resid norm
3.057749589444e+11 ||r(i)||/||b|| 1.363354172463e-01
  2 KSP preconditioned resid norm 1.962518730076e+04 true resid norm
2.510054932552e+10 ||r(i)||/||b|| 1.119154386357e-02
  3 KSP preconditioned resid norm 3.094963519133e+03 true resid norm
6.489763653495e+09 ||r(i)||/||b|| 2.893581078660e-03
  4 KSP preconditioned resid norm 1.755871992454e+03 true resid norm
2.315676037474e+09 ||r(i)||/||b|| 1.032486963178e-03
  5 KSP preconditioned resid norm 1.348939340771e+03 true resid norm
1.864929933344e+09 ||r(i)||/||b|| 8.315134812722e-04
  6 KSP preconditioned resid norm 5.532203694243e+02 true resid norm
9.985525631209e+08 ||r(i)||/||b|| 4.452231170449e-04
  7 KSP preconditioned resid norm 3.636087020506e+02 true resid norm
5.712899201028e+08 ||r(i)||/||b|| 2.547201703329e-04
  8 KSP preconditioned resid norm 2.926812321412e+02 true resid norm
3.627282296417e+08 ||r(i)||/||b|| 1.617290856843e-04
  9 KSP preconditioned resid norm 1.629184033135e+02 true resid norm
1.048838851435e+08 ||r(i)||/||b|| 4.676441881580e-05
 10 KSP preconditioned resid norm 8.297821067807e+01 true resid norm
3.423640694920e+07 ||r(i)||/||b|| 1.526493484800e-05
 11 KSP preconditioned resid norm 2.997246200648e+01 true resid norm
9.880250538293e+06 ||r(i)||/||b|| 4.405292324417e-06
 12 KSP preconditioned resid norm 2.156940809471e+01 true resid norm
3.521518932572e+06 ||r(i)||/||b|| 1.570134306192e-06
 13 KSP preconditioned resid norm 1.211823308437e+01 true resid norm
1.130205443318e+06 ||r(i)||/||b|| 5.039229870908e-07
Linear solve converged due to CONVERGED_RTOL iterations 13
KSP Object: 1 MPI processes
  type: gmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=50, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: lu
    out-of-place factorization
    tolerance for zero pivot 2.22045e-14
    matrix ordering: amd
    factor fill ratio given 0., needed 0.
      Factored matrix follows:
        Mat Object: 1 MPI processes
          type: klu
          rows=320745, cols=320745
          package used to perform factorization: klu
          total: nonzeros=0, allocated nonzeros=0
          total number of mallocs used during MatSetValues calls =0
            KLU stats:
              Number of diagonal blocks: 1
              Total nonzeros=2426908
            KLU runtime parameters:
              Partial pivoting tolerance: 0.001
              BTF preordering enabled: 1
              Ordering: AMD (not using the PETSc ordering)
              Matrix row scaling: NONE
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: seqaij
    rows=320745, cols=320745
    total: nonzeros=1928617, allocated nonzeros=1928617
    total number of mallocs used during MatSetValues calls =0
      using I-node routines: found 186383 nodes, limit used is 5

It was slower than the PETSc LU solver which doesn't seem to make sense to
me. The lines:

 total: nonzeros=0, allocated nonzeros=0
 total number of mallocs used during MatSetValues calls =0

seem to indicate that KLU needs to be customized to get the "high
performance" that the circuits community claim to have?

On Thu, Feb 7, 2019 at 8:03 AM Abhyankar, Shrirang G <
shrirang.abhyankar at pnnl.gov> wrote:

> Hi Justin,
>
>     The serial solve time for this matrix of size 320K is about 0.3 secs,
> which I think is pretty fast. As such, I am not surprised you are not
> getting any speed up in parallel with asm. Perhaps, you need a bigger
> matrix?
>
>
>
> It would be interesting to see a spy plot of the matrix on different
> processors. If there are a lot of off-diagonal elements, then asm may not
> work that well. You may need to use a partitioner in that case.
>
>
>
> As a side note, folks in the circuits community consider KLU as one of the
> fastest serial solvers for such kind of problems. You can try that out and
> see if it improves serial performance even further.
>
>
>
> --download-suitesparse
>
>
>
> -pc_factor_mat_solver_package klu -pc_factor_mat_ordering_type amd
>
>
>
> Shri
>
>
>
> *From:* Justin Chang <jychang48 at gmail.com>
> *Sent:* Wednesday, February 6, 2019 2:56 PM
> *To:* Abhyankar, Shrirang G <shrirang.abhyankar at pnnl.gov>
> *Cc:* Mark Adams <mfadams at lbl.gov>; PETSc users list <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Preconditioning systems of equations with
> complex numbers
>
>
>
> Hi Shri,
>
>
>
> Yeah so here at NREL I'm working with a group of folks (specifically
> Dheepak Krishnamurthy, who apparently knows you back when y'all were both
> at Argonne) and asked them to cook up a urban-suburban distribution system
> from OpenDSS (he wrote a Julia/Python wrapper for OpenDSS). I used the
> following options:
>
>
> -ksp_type gmres
> -pc_type asm
> -pc_asm_overlap 3
> -sub_pc_type lu
> -sub_pc_factor_mat_ordering_type amd # using rcm also produced similar
> results, all of which better than the default
>
> -ksp_monitor_true_residual
>
> -ksp_view
>
> -log_view
>
>
>
> And here are the long overdue outputs for up to 16 MPI processes on our
> Eagle HPC system (with Skylake server nodes).
>
>
>
> Basically, the algorithmic convergence degrades when more MPI procs are
> used, and while I could ameliorate the KSP count by increasing the overlap,
> it slowdowns the overall computation which I assume is from the
> communication latencies. Also note that the KSPSolve time also slowed down
> because of the ksp_monitor
>
>
>
> Is this a "big performance hit"? I don't know, maybe I need to first start
> off with a fully 3-phase system and see if similar solver characteristics
> are found?
>
>
>
> Thanks,
>
> Justin
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Feb 5, 2019 at 6:09 PM Abhyankar, Shrirang G <
> shrirang.abhyankar at pnnl.gov> wrote:
>
> Hi Justin,
>
>    Typically, the power grid distribution systems have a radial structure
> (unless it is an urban area) that leads to a, more or less, staircase type
> matrix. So a MatLoad() or VecLoad() would presumably just just splits the
> stairs, akin to a 1-D PDE. However, as you pointed out, it may cause the
> phases for a bus to be split across processors, but I don’t think it would
> cause a big performance hit.
>
> I’ve worked with OpenDSS recently and have created a C wrapper for it that
> you can use to query values from it. I can help out with your
> experimentations, if you need.
>
>
>
> Shri
>
> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> *On Behalf Of *Justin
> Chang via petsc-users
> *Sent:* Tuesday, February 5, 2019 5:23 PM
> *To:* Mark Adams <mfadams at lbl.gov>
> *Cc:* petsc-users <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Preconditioning systems of equations with
> complex numbers
>
>
>
> Hi all,
>
>
>
> So I *think* I know what the problem is. I am modeling a a distribution
> system where each bus can have anywhere from 1 to 3 phases. In the smallest
> example I had, all four buses had 3 phases (hence 12 dofs), and this was
> very easy to decompose - each MPI process should have a specific set of
> complex voltages on their respective nodes/buses. The larger test cases
> have lots of single phase buses attached, so a naive decomposition of the
> matrix isn't trivial. Again we're only extracting the Y-bus matrices out of
> an external software (OpenDSS to be specific) so we have no a priori
> information on how they organize their degrees-of-freedom - we let
> MatLoad() and VecLoad() do the decomposition. So it's possible that during
> this decomposition, a single bus' phases are being split across two or more
> MPI processes, which I assume would mess up the algorithmic performance of
> GAMG/ASM when more than one MPI process is needed.
>
>
>
> Yeah DMNetwork will fix this and ensure that all dofs per vertex do not
> get split across different MPI procs, but for now we're just trying to see
> if we can get a simple proof-of-concept working. We're trying to come up
> with a toy distribution system where everything is three-phase (hence very
> straightforward to decompose even with MatLoad().
>
>
>
> Justin
>
>
>
> On Tue, Feb 5, 2019 at 9:26 AM Mark Adams <mfadams at lbl.gov> wrote:
>
> I would stay away from eigen estimates in the solver (but give us the
> spectra to look at), so set -pc_gamg_agg_nsmooths 0 and use sor.
>
>
>
> Applications that have lived on direct solvers can add sorts of crap like
> penalty terms.
>
>
>
> sor seemed to work OK so I'd check the coarse grids in GAMG. Test with
> just two levels. That way you don't have to use sor on an internal MG grid,
> which can be bad and probably is here.
>
>
>
>
>
> On Mon, Feb 4, 2019 at 10:59 PM Smith, Barry F. via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>
>
> > On Feb 4, 2019, at 1:56 PM, Justin Chang via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> > Thanks everyone for your suggestions/feedback. So a few things:
> >
> > 1) When I examined larger distribution networks (~150k buses) my
> eigenvalue estimates from the chebyshev method get enormous indeed. See
> below:
> >
> > [0] PCSetUp_GAMG(): level 0) N=320745, n data rows=1, n data cols=1,
> nnz/row (ave)=6, np=1
> > [0] PCGAMGFilterGraph():       98.5638% nnz after filtering, with
> threshold 0., 6.01293 nnz ave. (N=320745)
> > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
> > [0] PCGAMGProlongator_AGG(): New grid 44797 nodes
> > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.403335e+00
> min=4.639523e-02 PC=jacobi
> > [0] PCSetUp_GAMG(): 1) N=44797, n data cols=1, nnz/row (ave)=7, 1 active
> pes
> > [0] PCGAMGFilterGraph():       99.9753% nnz after filtering, with
> threshold 0., 7.32435 nnz ave. (N=44797)
> > [0] PCGAMGProlongator_AGG(): New grid 13043 nodes
> > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=8.173298e+00
> min=9.687506e-01 PC=jacobi
> > [0] PCSetUp_GAMG(): 2) N=13043, n data cols=1, nnz/row (ave)=22, 1
> active pes
> > [0] PCGAMGFilterGraph():       99.684% nnz after filtering, with
> threshold 0., 22.5607 nnz ave. (N=13043)
> > [0] PCGAMGProlongator_AGG(): New grid 2256 nodes
> > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=5.696594e+00
> min=6.150856e-01 PC=jacobi
> > [0] PCSetUp_GAMG(): 3) N=2256, n data cols=1, nnz/row (ave)=79, 1 active
> pes
> > [0] PCGAMGFilterGraph():       93.859% nnz after filtering, with
> threshold 0., 79.5142 nnz ave. (N=2256)
> > [0] PCGAMGProlongator_AGG(): New grid 232 nodes
> > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.454120e+00
> min=6.780909e-01 PC=jacobi
> > [0] PCSetUp_GAMG(): 4) N=232, n data cols=1, nnz/row (ave)=206, 1 active
> pes
> > [0] PCGAMGFilterGraph():       99.1729% nnz after filtering, with
> threshold 0., 206.379 nnz ave. (N=232)
> > [0] PCGAMGProlongator_AGG(): New grid 9 nodes
> > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.443612e+00
> min=2.153627e-01 PC=jacobi
> > [0] PCSetUp_GAMG(): 5) N=9, n data cols=1, nnz/row (ave)=9, 1 active pes
> > [0] PCSetUp_GAMG(): 6 levels, grid complexity = 1.44058
> >
> > 2) I tried all the suggestions mentioned before: setting
> -pc_gamg_agg_nsmooths 0  -pc_gamg_square_graph 10 did not improve my
> convergence. Neither did explicitly setting -mg_coarse_pc_type lu or more
> iterations of richardson/sor.
> >
> > 3a) -pc_type asm works only if I set -sub_pc_type lu. Basically I'm just
> solving LU on the whole system.
> >
> > 3b)The problem is that I can't get it to even speedup across 2 MPI
> processes for my 150k bus case (~300k dofs) - and I already checked that
> this problem could in theory be parallelized by setting the -ksp_max_it to
> something low and observing that the KSPSolve time decreases as MPI
> concurrency increases. The potential speedup is countered by the fact that
> the algorithmic convergence rate blows up when either more MPI processes
> are added or when I tune the block_size/overlap parameters.
>
>    Yes, we just haven't gotten a scalable preconditioner yet for your
> matrix.
>
> >
> > Which leads me to one of two conclusions/questions:
> >
> > A) Is my 300 DOF problem still too small? Do I need to look at a problem
> with, say, 10 Million DOF or more to see that increasing the asm_block_size
> will have better performance (despite having more KSP iterations) than a
> single giant asm_block?
>
>    No, if you had a scalable preconditioner you would see good speed up
> with 300k unknowns, using 10 million won't help.
>
> >
> > B) Or is there something even more sinister about my system of equations
> in complex form?
>
>   There is a technical term for matrices for which both GAMG and ASM do
> very poorly on: nasty :-)
>
>   Have you tried using parallel LU (for example ./configure
> --download-superlu_dist). Direct LU is the last refuge of the
> unpreconditionable. How large will your matrices get?
>
>    Barry
>
> >
> > Thanks,
> > Justin
> >
> > On Sat, Feb 2, 2019 at 1:41 PM Matthew Knepley <knepley at gmail.com>
> wrote:
> > The coarse grid is getting set to SOR rather than LU.
> >
> >    Matt
> >
> > On Fri, Feb 1, 2019 at 3:54 PM Justin Chang <jychang48 at gmail.com> wrote:
> > I tried these options:
> >
> > -ksp_view
> > -ksp_monitor_true_residual
> > -ksp_type gmres
> > -ksp_max_it 50
> > -pc_type gamg
> > -mg_coarse_pc_type sor
> > -mg_levels_1_ksp_type richardson
> > -mg_levels_2_ksp_type richardson
> > -mg_levels_3_ksp_type richardson
> > -mg_levels_4_ksp_type richardson
> > -mg_levels_1_pc_type sor
> > -mg_levels_2_pc_type sor
> > -mg_levels_3_pc_type sor
> > -mg_levels_4_pc_type sor
> >
> > And still have a non-converging solution:
> >
> >   0 KSP preconditioned resid norm 1.573625743885e+05 true resid norm
> 1.207500000000e+08 ||r(i)||/||b|| 1.000000000000e+00
> >   1 KSP preconditioned resid norm 7.150979785488e+00 true resid norm
> 1.218223835173e+04 ||r(i)||/||b|| 1.008881022917e-04
> >   2 KSP preconditioned resid norm 5.627140225062e+00 true resid norm
> 1.209346465397e+04 ||r(i)||/||b|| 1.001529163889e-04
> >   3 KSP preconditioned resid norm 5.077324098998e+00 true resid norm
> 1.211293532189e+04 ||r(i)||/||b|| 1.003141641564e-04
> >   4 KSP preconditioned resid norm 4.324840589068e+00 true resid norm
> 1.213758867766e+04 ||r(i)||/||b|| 1.005183327342e-04
> >   5 KSP preconditioned resid norm 4.107764194462e+00 true resid norm
> 1.212180975158e+04 ||r(i)||/||b|| 1.003876583981e-04
> >   6 KSP preconditioned resid norm 4.033152911227e+00 true resid norm
> 1.216543340208e+04 ||r(i)||/||b|| 1.007489308660e-04
> >   7 KSP preconditioned resid norm 3.944875788743e+00 true resid norm
> 1.220023550040e+04 ||r(i)||/||b|| 1.010371470012e-04
> >   8 KSP preconditioned resid norm 3.926562233824e+00 true resid norm
> 1.220662039610e+04 ||r(i)||/||b|| 1.010900239842e-04
> >   9 KSP preconditioned resid norm 3.912450246357e+00 true resid norm
> 1.219755005917e+04 ||r(i)||/||b|| 1.010149073223e-04
> >  10 KSP preconditioned resid norm 3.838058250285e+00 true resid norm
> 1.218625848551e+04 ||r(i)||/||b|| 1.009213953251e-04
> >  11 KSP preconditioned resid norm 3.779259857769e+00 true resid norm
> 1.219659071592e+04 ||r(i)||/||b|| 1.010069624506e-04
> >  12 KSP preconditioned resid norm 3.694467405997e+00 true resid norm
> 1.223911592358e+04 ||r(i)||/||b|| 1.013591380834e-04
> >  13 KSP preconditioned resid norm 3.693468937696e+00 true resid norm
> 1.223724162745e+04 ||r(i)||/||b|| 1.013436159623e-04
> >  14 KSP preconditioned resid norm 3.616103662529e+00 true resid norm
> 1.221242236516e+04 ||r(i)||/||b|| 1.011380734175e-04
> >  15 KSP preconditioned resid norm 3.604541982832e+00 true resid norm
> 1.220338362332e+04 ||r(i)||/||b|| 1.010632184126e-04
> >  16 KSP preconditioned resid norm 3.599588304412e+00 true resid norm
> 1.219870618529e+04 ||r(i)||/||b|| 1.010244818657e-04
> >  17 KSP preconditioned resid norm 3.581088429735e+00 true resid norm
> 1.219098863716e+04 ||r(i)||/||b|| 1.009605684237e-04
> >  18 KSP preconditioned resid norm 3.577183857191e+00 true resid norm
> 1.219422257290e+04 ||r(i)||/||b|| 1.009873505002e-04
> >  19 KSP preconditioned resid norm 3.576075374018e+00 true resid norm
> 1.219687565514e+04 ||r(i)||/||b|| 1.010093221958e-04
> >  20 KSP preconditioned resid norm 3.574659675290e+00 true resid norm
> 1.219494681037e+04 ||r(i)||/||b|| 1.009933483260e-04
> >  21 KSP preconditioned resid norm 3.555091405270e+00 true resid norm
> 1.218714149002e+04 ||r(i)||/||b|| 1.009287079919e-04
> >  22 KSP preconditioned resid norm 3.553181473875e+00 true resid norm
> 1.218106845443e+04 ||r(i)||/||b|| 1.008784137012e-04
> >  23 KSP preconditioned resid norm 3.552613734076e+00 true resid norm
> 1.217794060488e+04 ||r(i)||/||b|| 1.008525101853e-04
> >  24 KSP preconditioned resid norm 3.551632777626e+00 true resid norm
> 1.218032078237e+04 ||r(i)||/||b|| 1.008722218001e-04
> >  25 KSP preconditioned resid norm 3.545808514701e+00 true resid norm
> 1.219292407751e+04 ||r(i)||/||b|| 1.009765969151e-04
> >  26 KSP preconditioned resid norm 3.528978940908e+00 true resid norm
> 1.219073670770e+04 ||r(i)||/||b|| 1.009584820513e-04
> >  27 KSP preconditioned resid norm 3.527789136030e+00 true resid norm
> 1.218958906209e+04 ||r(i)||/||b|| 1.009489777399e-04
> >  28 KSP preconditioned resid norm 3.525383863095e+00 true resid norm
> 1.218344768375e+04 ||r(i)||/||b|| 1.008981174637e-04
> >  29 KSP preconditioned resid norm 3.521750505784e+00 true resid norm
> 1.218775781359e+04 ||r(i)||/||b|| 1.009338121208e-04
> >  30 KSP preconditioned resid norm 3.521656008348e+00 true resid norm
> 1.218930075437e+04 ||r(i)||/||b|| 1.009465900983e-04
> >  31 KSP preconditioned resid norm 3.521655969692e+00 true resid norm
> 1.218928102429e+04 ||r(i)||/||b|| 1.009464267021e-04
> >  32 KSP preconditioned resid norm 3.521654896823e+00 true resid norm
> 1.218929084207e+04 ||r(i)||/||b|| 1.009465080088e-04
> >  33 KSP preconditioned resid norm 3.521654041929e+00 true resid norm
> 1.218930804845e+04 ||r(i)||/||b|| 1.009466505047e-04
> >  34 KSP preconditioned resid norm 3.521652043655e+00 true resid norm
> 1.218914044919e+04 ||r(i)||/||b|| 1.009452625192e-04
> >  35 KSP preconditioned resid norm 3.521602884006e+00 true resid norm
> 1.218810333491e+04 ||r(i)||/||b|| 1.009366735810e-04
> >  36 KSP preconditioned resid norm 3.521535454292e+00 true resid norm
> 1.218820358646e+04 ||r(i)||/||b|| 1.009375038215e-04
> >  37 KSP preconditioned resid norm 3.521433576778e+00 true resid norm
> 1.218859696943e+04 ||r(i)||/||b|| 1.009407616516e-04
> >  38 KSP preconditioned resid norm 3.521349747881e+00 true resid norm
> 1.218888798454e+04 ||r(i)||/||b|| 1.009431717146e-04
> >  39 KSP preconditioned resid norm 3.521212709133e+00 true resid norm
> 1.218882278104e+04 ||r(i)||/||b|| 1.009426317270e-04
> >  40 KSP preconditioned resid norm 3.521168785360e+00 true resid norm
> 1.218866389996e+04 ||r(i)||/||b|| 1.009413159416e-04
> >  41 KSP preconditioned resid norm 3.521164077366e+00 true resid norm
> 1.218868324624e+04 ||r(i)||/||b|| 1.009414761593e-04
> >  42 KSP preconditioned resid norm 3.521101506147e+00 true resid norm
> 1.218773304385e+04 ||r(i)||/||b|| 1.009336069884e-04
> >  43 KSP preconditioned resid norm 3.521013554688e+00 true resid norm
> 1.218675768694e+04 ||r(i)||/||b|| 1.009255294984e-04
> >  44 KSP preconditioned resid norm 3.520820039115e+00 true resid norm
> 1.218829726209e+04 ||r(i)||/||b|| 1.009382796032e-04
> >  45 KSP preconditioned resid norm 3.520763991575e+00 true resid norm
> 1.218776399191e+04 ||r(i)||/||b|| 1.009338632870e-04
> >  46 KSP preconditioned resid norm 3.520501770928e+00 true resid norm
> 1.218624167053e+04 ||r(i)||/||b|| 1.009212560706e-04
> >  47 KSP preconditioned resid norm 3.519005707047e+00 true resid norm
> 1.219233855078e+04 ||r(i)||/||b|| 1.009717478325e-04
> >  48 KSP preconditioned resid norm 3.518379807717e+00 true resid norm
> 1.218961932675e+04 ||r(i)||/||b|| 1.009492283788e-04
> >  49 KSP preconditioned resid norm 3.517809415824e+00 true resid norm
> 1.218777984143e+04 ||r(i)||/||b|| 1.009339945460e-04
> >  50 KSP preconditioned resid norm 3.517617854442e+00 true resid norm
> 1.219011629981e+04 ||r(i)||/||b|| 1.009533440978e-04
> > KSP Object: 1 MPI processes
> >   type: gmres
> >     restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >     happy breakdown tolerance 1e-30
> >   maximum iterations=50, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   left preconditioning
> >   using PRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> >   type: gamg
> >     type is MULTIPLICATIVE, levels=5 cycles=v
> >       Cycles per PCApply=1
> >       Using externally compute Galerkin coarse grid matrices
> >       GAMG specific options
> >         Threshold for dropping small values in graph on each level =
>  0.   0.   0.
> >         Threshold scaling factor for each level not specified = 1.
> >         AGG specific options
> >           Symmetric graph false
> >           Number of levels to square graph 1
> >           Number smoothing steps 1
> >         Complexity:    grid = 1.31821
> >   Coarse grid solver -- level -------------------------------
> >     KSP Object: (mg_coarse_) 1 MPI processes
> >       type: preonly
> >       maximum iterations=10000, initial guess is zero
> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >       left preconditioning
> >       using NONE norm type for convergence test
> >     PC Object: (mg_coarse_) 1 MPI processes
> >       type: sor
> >         type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
> >       linear system matrix = precond matrix:
> >       Mat Object: 1 MPI processes
> >         type: seqaij
> >         rows=17, cols=17
> >         total: nonzeros=163, allocated nonzeros=163
> >         total number of mallocs used during MatSetValues calls =0
> >           using I-node routines: found 12 nodes, limit used is 5
> >   Down solver (pre-smoother) on level 1 -------------------------------
> >     KSP Object: (mg_levels_1_) 1 MPI processes
> >       type: richardson
> >         damping factor=1.
> >       maximum iterations=2, nonzero initial guess
> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >       left preconditioning
> >       using NONE norm type for convergence test
> >     PC Object: (mg_levels_1_) 1 MPI processes
> >       type: sor
> >         type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
> >       linear system matrix = precond matrix:
> >       Mat Object: 1 MPI processes
> >         type: seqaij
> >         rows=100, cols=100
> >         total: nonzeros=1240, allocated nonzeros=1240
> >         total number of mallocs used during MatSetValues calls =0
> >           not using I-node routines
> >   Up solver (post-smoother) same as down solver (pre-smoother)
> >   Down solver (pre-smoother) on level 2 -------------------------------
> >     KSP Object: (mg_levels_2_) 1 MPI processes
> >       type: richardson
> >         damping factor=1.
> >       maximum iterations=2, nonzero initial guess
> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >       left preconditioning
> >       using NONE norm type for convergence test
> >     PC Object: (mg_levels_2_) 1 MPI processes
> >       type: sor
> >         type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
> >       linear system matrix = precond matrix:
> >       Mat Object: 1 MPI processes
> >         type: seqaij
> >         rows=537, cols=537
> >         total: nonzeros=5291, allocated nonzeros=5291
> >         total number of mallocs used during MatSetValues calls =0
> >           not using I-node routines
> >   Up solver (post-smoother) same as down solver (pre-smoother)
> >   Down solver (pre-smoother) on level 3 -------------------------------
> >     KSP Object: (mg_levels_3_) 1 MPI processes
> >       type: richardson
> >         damping factor=1.
> >       maximum iterations=2, nonzero initial guess
> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >       left preconditioning
> >       using NONE norm type for convergence test
> >     PC Object: (mg_levels_3_) 1 MPI processes
> >       type: sor
> >         type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
> >       linear system matrix = precond matrix:
> >       Mat Object: 1 MPI processes
> >         type: seqaij
> >         rows=1541, cols=1541
> >         total: nonzeros=8039, allocated nonzeros=8039
> >         total number of mallocs used during MatSetValues calls =0
> >           not using I-node routines
> >   Up solver (post-smoother) same as down solver (pre-smoother)
> >   Down solver (pre-smoother) on level 4 -------------------------------
> >     KSP Object: (mg_levels_4_) 1 MPI processes
> >       type: richardson
> >         damping factor=1.
> >       maximum iterations=2, nonzero initial guess
> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >       left preconditioning
> >       using NONE norm type for convergence test
> >     PC Object: (mg_levels_4_) 1 MPI processes
> >       type: sor
> >         type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
> >       linear system matrix = precond matrix:
> >       Mat Object: 1 MPI processes
> >         type: seqaij
> >         rows=8541, cols=8541
> >         total: nonzeros=46299, allocated nonzeros=46299
> >         total number of mallocs used during MatSetValues calls =0
> >           using I-node routines: found 5464 nodes, limit used is 5
> >   Up solver (post-smoother) same as down solver (pre-smoother)
> >   linear system matrix = precond matrix:
> >   Mat Object: 1 MPI processes
> >     type: seqaij
> >     rows=8541, cols=8541
> >     total: nonzeros=46299, allocated nonzeros=46299
> >     total number of mallocs used during MatSetValues calls =0
> >       using I-node routines: found 5464 nodes, limit used is 5
> >
> > Am I doing this right? Did I miss anything?
> >
> >
> >
> > On Fri, Feb 1, 2019 at 1:22 PM Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Feb 1, 2019 at 3:05 PM Justin Chang <jychang48 at gmail.com> wrote:
> > Hi Mark,
> >
> > 1) So with these options:
> >
> > -ksp_type gmres
> > -ksp_rtol 1e-15
> > -ksp_monitor_true_residual
> > -ksp_converged_reason
> > -pc_type bjacobi
> >
> > This is what I get:
> >
> >   0 KSP preconditioned resid norm 1.900749341028e+04 true resid norm
> 1.603849239146e+07 ||r(i)||/||b|| 1.000000000000e+00
> >   1 KSP preconditioned resid norm 1.363172190058e+03 true resid norm
> 8.144804150077e+05 ||r(i)||/||b|| 5.078285384488e-02
> >   2 KSP preconditioned resid norm 1.744431536443e+02 true resid norm
> 2.668646207476e+04 ||r(i)||/||b|| 1.663900909350e-03
> >   3 KSP preconditioned resid norm 2.798960448093e+01 true resid norm
> 2.142171349084e+03 ||r(i)||/||b|| 1.335643835343e-04
> >   4 KSP preconditioned resid norm 2.938319245576e+00 true resid norm
> 1.457432872748e+03 ||r(i)||/||b|| 9.087093956061e-05
> >   5 KSP preconditioned resid norm 1.539484356450e-12 true resid norm
> 6.667274739274e-09 ||r(i)||/||b|| 4.157045797412e-16
> > Linear solve converged due to CONVERGED_RTOL iterations 5
> >
> > 2) With richardson/sor:
> >
> > Okay, its looks like Richardson/SOR solves this just fine. You can use
> this as the smoother for GAMG instead
> > of Cheby/Jacobi, and probably see better results on the larger problems.
> >
> >   Matt
> >
> > -ksp_type richardson
> > -ksp_rtol 1e-15
> > -ksp_monitor_true_residual
> > -pc_type sor
> >
> > This is what I get:
> >
> >   0 KSP preconditioned resid norm 1.772935756018e+04 true resid norm
> 1.603849239146e+07 ||r(i)||/||b|| 1.000000000000e+00
> >   1 KSP preconditioned resid norm 1.206881305953e+03 true resid norm
> 4.507533687463e+03 ||r(i)||/||b|| 2.810447252426e-04
> >   2 KSP preconditioned resid norm 4.166906741810e+02 true resid norm
> 1.221098715634e+03 ||r(i)||/||b|| 7.613550487354e-05
> >   3 KSP preconditioned resid norm 1.540698682668e+02 true resid norm
> 4.241154731706e+02 ||r(i)||/||b|| 2.644359973612e-05
> >   4 KSP preconditioned resid norm 5.904921520051e+01 true resid norm
> 1.587778309916e+02 ||r(i)||/||b|| 9.899797756316e-06
> >   5 KSP preconditioned resid norm 2.327938633860e+01 true resid norm
> 6.161476099609e+01 ||r(i)||/||b|| 3.841680345773e-06
> >   6 KSP preconditioned resid norm 9.409043410169e+00 true resid norm
> 2.458600025207e+01 ||r(i)||/||b|| 1.532937114786e-06
> >   7 KSP preconditioned resid norm 3.888365194933e+00 true resid norm
> 1.005868527582e+01 ||r(i)||/||b|| 6.271590265661e-07
> >   8 KSP preconditioned resid norm 1.638018293396e+00 true resid norm
> 4.207888403975e+00 ||r(i)||/||b|| 2.623618418285e-07
> >   9 KSP preconditioned resid norm 7.010639340830e-01 true resid norm
> 1.793827818698e+00 ||r(i)||/||b|| 1.118451644279e-07
> >  10 KSP preconditioned resid norm 3.038491129050e-01 true resid norm
> 7.763187747256e-01 ||r(i)||/||b|| 4.840347557474e-08
> >  11 KSP preconditioned resid norm 1.329641892383e-01 true resid norm
> 3.398137553975e-01 ||r(i)||/||b|| 2.118738763617e-08
> >  12 KSP preconditioned resid norm 5.860142364318e-02 true resid norm
> 1.499670452314e-01 ||r(i)||/||b|| 9.350445264501e-09
> >  13 KSP preconditioned resid norm 2.596075957908e-02 true resid norm
> 6.655772419557e-02 ||r(i)||/||b|| 4.149874101073e-09
> >  14 KSP preconditioned resid norm 1.154254160823e-02 true resid norm
> 2.964959279341e-02 ||r(i)||/||b|| 1.848652109546e-09
> >  15 KSP preconditioned resid norm 5.144785556436e-03 true resid norm
> 1.323918323132e-02 ||r(i)||/||b|| 8.254630739714e-10
> >  16 KSP preconditioned resid norm 2.296969429446e-03 true resid norm
> 5.919889645162e-03 ||r(i)||/||b|| 3.691051191517e-10
> >  17 KSP preconditioned resid norm 1.026615599876e-03 true resid norm
> 2.649106197813e-03 ||r(i)||/||b|| 1.651717713333e-10
> >  18 KSP preconditioned resid norm 4.591391433184e-04 true resid norm
> 1.185874433977e-03 ||r(i)||/||b|| 7.393927091354e-11
> >  19 KSP preconditioned resid norm 2.054186999728e-04 true resid norm
> 5.309080126683e-04 ||r(i)||/||b|| 3.310211456976e-11
> >  20 KSP preconditioned resid norm 9.192021190159e-05 true resid norm
> 2.376701584665e-04 ||r(i)||/||b|| 1.481873437138e-11
> >  21 KSP preconditioned resid norm 4.113417679473e-05 true resid norm
> 1.063822174067e-04 ||r(i)||/||b|| 6.632931251279e-12
> >  22 KSP preconditioned resid norm 1.840693141405e-05 true resid norm
> 4.760872579965e-05 ||r(i)||/||b|| 2.968404051805e-12
> >  23 KSP preconditioned resid norm 8.236207862555e-06 true resid norm
> 2.130217091325e-05 ||r(i)||/||b|| 1.328190355634e-12
> >  24 KSP preconditioned resid norm 3.684941963736e-06 true resid norm
> 9.529827966741e-06 ||r(i)||/||b|| 5.941847733654e-13
> >  25 KSP preconditioned resid norm 1.648500148983e-06 true resid norm
> 4.262645649931e-06 ||r(i)||/||b|| 2.657759561118e-13
> >  26 KSP preconditioned resid norm 7.373967102970e-07 true resid norm
> 1.906404712490e-06 ||r(i)||/||b|| 1.188643337515e-13
> >  27 KSP preconditioned resid norm 3.298179068243e-07 true resid norm
> 8.525291822193e-07 ||r(i)||/||b|| 5.315519447909e-14
> >  28 KSP preconditioned resid norm 1.475043181061e-07 true resid norm
> 3.812420047527e-07 ||r(i)||/||b|| 2.377043898189e-14
> >  29 KSP preconditioned resid norm 6.596561561066e-08 true resid norm
> 1.704561511399e-07 ||r(i)||/||b|| 1.062794101712e-14
> >  30 KSP preconditioned resid norm 2.949954993990e-08 true resid norm
> 7.626396764524e-08 ||r(i)||/||b|| 4.755058379793e-15
> >  31 KSP preconditioned resid norm 1.319299835423e-08 true resid norm
> 3.408580527641e-08 ||r(i)||/||b|| 2.125249957693e-15
> >  32 KSP preconditioned resid norm 5.894082812579e-09 true resid norm
> 1.548343581158e-08 ||r(i)||/||b|| 9.653922222659e-16
> >  33 KSP preconditioned resid norm 2.636703982134e-09 true resid norm
> 7.135606738493e-09 ||r(i)||/||b|| 4.449050798748e-16
> >  34 KSP preconditioned resid norm 1.180985878209e-09 true resid norm
> 3.381752613021e-09 ||r(i)||/||b|| 2.108522752938e-16
> >  35 KSP preconditioned resid norm 5.286215416700e-10 true resid norm
> 2.401714396480e-09 ||r(i)||/||b|| 1.497468925296e-16
> >  36 KSP preconditioned resid norm 2.343627265669e-10 true resid norm
> 2.406695615135e-09 ||r(i)||/||b|| 1.500574715125e-16
> >  37 KSP preconditioned resid norm 1.063481191780e-10 true resid norm
> 1.939409664821e-09 ||r(i)||/||b|| 1.209221925282e-16
> >  38 KSP preconditioned resid norm 4.641441861184e-11 true resid norm
> 2.137293190758e-09 ||r(i)||/||b|| 1.332602303628e-16
> >  39 KSP preconditioned resid norm 2.197549316276e-11 true resid norm
> 2.134629170008e-09 ||r(i)||/||b|| 1.330941286692e-16
> >  40 KSP preconditioned resid norm 1.014465992249e-11 true resid norm
> 2.134073377634e-09 ||r(i)||/||b|| 1.330594750147e-16
> > Linear solve converged due to CONVERGED_RTOL iterations 40
> >
> > 3) And lastly with chebyshev/jacobi:
> >
> > -ksp_type chebyshev
> > -ksp_rtol 1e-15
> > -ksp_monitor_true_residual
> > -pc_type jacobi
> >
> >   0 KSP preconditioned resid norm 1.124259077563e+04 true resid norm
> 2.745522604971e+06 ||r(i)||/||b|| 1.711833343159e-01
> >   1 KSP preconditioned resid norm 7.344319020428e+03 true resid norm
> 7.783641348970e+06 ||r(i)||/||b|| 4.853100378135e-01
> >   2 KSP preconditioned resid norm 1.071669918360e+04 true resid norm
> 2.799860726937e+06 ||r(i)||/||b|| 1.745713162185e-01
> >   3 KSP preconditioned resid norm 3.419051822673e+03 true resid norm
> 1.775069259453e+06 ||r(i)||/||b|| 1.106755682597e-01
> >   4 KSP preconditioned resid norm 4.986468711193e+03 true resid norm
> 1.206925036347e+06 ||r(i)||/||b|| 7.525177597052e-02
> >   5 KSP preconditioned resid norm 1.700832321100e+03 true resid norm
> 2.637405831602e+05 ||r(i)||/||b|| 1.644422535005e-02
> >   6 KSP preconditioned resid norm 1.643529813686e+03 true resid norm
> 3.974328566033e+05 ||r(i)||/||b|| 2.477993859417e-02
> >   7 KSP preconditioned resid norm 7.473371550560e+02 true resid norm
> 5.323098795195e+04 ||r(i)||/||b|| 3.318952096788e-03
> >   8 KSP preconditioned resid norm 4.683110030109e+02 true resid norm
> 1.087414808661e+05 ||r(i)||/||b|| 6.780031327884e-03
> >   9 KSP preconditioned resid norm 2.873339948815e+02 true resid norm
> 3.568238211189e+04 ||r(i)||/||b|| 2.224796523325e-03
> >  10 KSP preconditioned resid norm 1.262071076718e+02 true resid norm
> 2.462817350416e+04 ||r(i)||/||b|| 1.535566617052e-03
> >  11 KSP preconditioned resid norm 1.002027390320e+02 true resid norm
> 1.546159763209e+04 ||r(i)||/||b|| 9.640306117750e-04
> >  12 KSP preconditioned resid norm 3.354594608285e+01 true resid norm
> 4.591691194787e+03 ||r(i)||/||b|| 2.862919458211e-04
> >  13 KSP preconditioned resid norm 3.131257260705e+01 true resid norm
> 5.245042976548e+03 ||r(i)||/||b|| 3.270284293891e-04
> >  14 KSP preconditioned resid norm 9.505226922340e+00 true resid norm
> 1.166368684616e+03 ||r(i)||/||b|| 7.272308744166e-05
> >  15 KSP preconditioned resid norm 8.743145384463e+00 true resid norm
> 1.504382325425e+03 ||r(i)||/||b|| 9.379823793327e-05
> >  16 KSP preconditioned resid norm 3.088859985366e+00 true resid norm
> 5.407763762347e+02 ||r(i)||/||b|| 3.371740703775e-05
> >  17 KSP preconditioned resid norm 2.376841598522e+00 true resid norm
> 3.716033743805e+02 ||r(i)||/||b|| 2.316947037855e-05
> >  18 KSP preconditioned resid norm 1.081970394434e+00 true resid norm
> 2.205852224893e+02 ||r(i)||/||b|| 1.375348861385e-05
> >  19 KSP preconditioned resid norm 6.554707900903e-01 true resid norm
> 8.301134625673e+01 ||r(i)||/||b|| 5.175757435963e-06
> >  20 KSP preconditioned resid norm 3.637023760759e-01 true resid norm
> 7.510650297359e+01 ||r(i)||/||b|| 4.682890457559e-06
> >  21 KSP preconditioned resid norm 1.871804319271e-01 true resid norm
> 2.219347914083e+01 ||r(i)||/||b|| 1.383763423590e-06
> >  22 KSP preconditioned resid norm 1.100146172732e-01 true resid norm
> 2.219515834389e+01 ||r(i)||/||b|| 1.383868121901e-06
> >  23 KSP preconditioned resid norm 5.760669698705e-02 true resid norm
> 8.337640509358e+00 ||r(i)||/||b|| 5.198518854427e-07
> >  24 KSP preconditioned resid norm 2.957448587725e-02 true resid norm
> 5.874493674310e+00 ||r(i)||/||b|| 3.662746803708e-07
> >  25 KSP preconditioned resid norm 1.998893198438e-02 true resid norm
> 3.167578238976e+00 ||r(i)||/||b|| 1.974985030802e-07
> >  26 KSP preconditioned resid norm 8.664848450375e-03 true resid norm
> 1.486849213225e+00 ||r(i)||/||b|| 9.270504838826e-08
> >  27 KSP preconditioned resid norm 6.981883525312e-03 true resid norm
> 1.069480324023e+00 ||r(i)||/||b|| 6.668209816232e-08
> >  28 KSP preconditioned resid norm 2.719053601907e-03 true resid norm
> 4.199959208195e-01 ||r(i)||/||b|| 2.618674564719e-08
> >  29 KSP preconditioned resid norm 2.165577279425e-03 true resid norm
> 3.210144228918e-01 ||r(i)||/||b|| 2.001524925514e-08
> >  30 KSP preconditioned resid norm 7.988525722643e-04 true resid norm
> 1.420960576866e-01 ||r(i)||/||b|| 8.859689191379e-09
> >  31 KSP preconditioned resid norm 6.325404656692e-04 true resid norm
> 8.848430840431e-02 ||r(i)||/||b|| 5.516996625657e-09
> >  32 KSP preconditioned resid norm 2.774874251260e-04 true resid norm
> 4.978943834544e-02 ||r(i)||/||b|| 3.104371478953e-09
> >  33 KSP preconditioned resid norm 2.189482639986e-04 true resid norm
> 2.363450483074e-02 ||r(i)||/||b|| 1.473611375302e-09
> >  34 KSP preconditioned resid norm 1.083040043835e-04 true resid norm
> 1.640956513288e-02 ||r(i)||/||b|| 1.023136385414e-09
> >  35 KSP preconditioned resid norm 7.862356661381e-05 true resid norm
> 6.670818331105e-03 ||r(i)||/||b|| 4.159255226918e-10
> >  36 KSP preconditioned resid norm 3.874849522187e-05 true resid norm
> 5.033219998314e-03 ||r(i)||/||b|| 3.138212667043e-10
> >  37 KSP preconditioned resid norm 2.528412836894e-05 true resid norm
> 2.108237735582e-03 ||r(i)||/||b|| 1.314486227337e-10
> >  38 KSP preconditioned resid norm 1.267202237267e-05 true resid norm
> 1.451831066002e-03 ||r(i)||/||b|| 9.052166691026e-11
> >  39 KSP preconditioned resid norm 8.210280946453e-06 true resid norm
> 7.040263504011e-04 ||r(i)||/||b|| 4.389604292084e-11
> >  40 KSP preconditioned resid norm 4.663490696194e-06 true resid norm
> 4.017228546553e-04 ||r(i)||/||b|| 2.504741997254e-11
> >  41 KSP preconditioned resid norm 3.031143852348e-06 true resid norm
> 2.322717593194e-04 ||r(i)||/||b|| 1.448214418476e-11
> >  42 KSP preconditioned resid norm 1.849864051869e-06 true resid norm
> 1.124281997627e-04 ||r(i)||/||b|| 7.009898250945e-12
> >  43 KSP preconditioned resid norm 1.124434187023e-06 true resid norm
> 7.306110293564e-05 ||r(i)||/||b|| 4.555359765270e-12
> >  44 KSP preconditioned resid norm 6.544412722416e-07 true resid norm
> 3.401751796431e-05 ||r(i)||/||b|| 2.120992243786e-12
> >  45 KSP preconditioned resid norm 3.793047150173e-07 true resid norm
> 2.148969752882e-05 ||r(i)||/||b|| 1.339882640108e-12
> >  46 KSP preconditioned resid norm 2.171588698514e-07 true resid norm
> 1.106350941000e-05 ||r(i)||/||b|| 6.898098112943e-13
> >  47 KSP preconditioned resid norm 1.296462934907e-07 true resid norm
> 6.099370095521e-06 ||r(i)||/||b|| 3.802957252247e-13
> >  48 KSP preconditioned resid norm 8.025171649527e-08 true resid norm
> 3.573262429072e-06 ||r(i)||/||b|| 2.227929123173e-13
> >  49 KSP preconditioned resid norm 4.871794377896e-08 true resid norm
> 1.812080394338e-06 ||r(i)||/||b|| 1.129832125183e-13
> >  50 KSP preconditioned resid norm 3.113615807350e-08 true resid norm
> 1.077814343384e-06 ||r(i)||/||b|| 6.720172426917e-14
> >  51 KSP preconditioned resid norm 1.796713291999e-08 true resid norm
> 5.820635497903e-07 ||r(i)||/||b|| 3.629166230738e-14
> >  52 KSP preconditioned resid norm 1.101194810402e-08 true resid norm
> 3.130702427551e-07 ||r(i)||/||b|| 1.951992962392e-14
> >  53 KSP preconditioned resid norm 6.328675584656e-09 true resid norm
> 1.886022841038e-07 ||r(i)||/||b|| 1.175935240673e-14
> >  54 KSP preconditioned resid norm 3.753197413786e-09 true resid norm
> 9.249216284944e-08 ||r(i)||/||b|| 5.766886350159e-15
> >  55 KSP preconditioned resid norm 2.289205545523e-09 true resid norm
> 5.644941358874e-08 ||r(i)||/||b|| 3.519620935120e-15
> >  56 KSP preconditioned resid norm 1.398041051045e-09 true resid norm
> 2.841582922959e-08 ||r(i)||/||b|| 1.771726951389e-15
> >  57 KSP preconditioned resid norm 8.587888866230e-10 true resid norm
> 1.829052851330e-08 ||r(i)||/||b|| 1.140414452112e-15
> >  58 KSP preconditioned resid norm 5.353939794444e-10 true resid norm
> 1.100004853784e-08 ||r(i)||/||b|| 6.858530259177e-16
> >  59 KSP preconditioned resid norm 3.152419669065e-10 true resid norm
> 6.833334353224e-09 ||r(i)||/||b|| 4.260583966647e-16
> >  60 KSP preconditioned resid norm 1.930837697706e-10 true resid norm
> 3.532215487724e-09 ||r(i)||/||b|| 2.202336355258e-16
> >  61 KSP preconditioned resid norm 1.138921366053e-10 true resid norm
> 2.879312701518e-09 ||r(i)||/||b|| 1.795251468306e-16
> >  62 KSP preconditioned resid norm 6.820698934300e-11 true resid norm
> 2.528012115752e-09 ||r(i)||/||b|| 1.576215553214e-16
> >  63 KSP preconditioned resid norm 4.141390392052e-11 true resid norm
> 3.265136945688e-09 ||r(i)||/||b|| 2.035812884400e-16
> >  64 KSP preconditioned resid norm 2.447449492240e-11 true resid norm
> 3.548082053472e-09 ||r(i)||/||b|| 2.212229158996e-16
> >  65 KSP preconditioned resid norm 1.530621705437e-11 true resid norm
> 2.329307411648e-09 ||r(i)||/||b|| 1.452323170280e-16
> >  66 KSP preconditioned resid norm 1.110145418759e-11 true resid norm
> 2.794373041066e-09 ||r(i)||/||b|| 1.742291590046e-16
> > Linear solve converged due to CONVERGED_RTOL iterations 67
> >
> > Looks like neither of these ksp/pc combos are good? I also tried
> -pc_gamg_agg_nsmooths 0 but it didn't improve the solver at all. Here's the
> GAMG info I was able to grep from the out-of-box params:
> >
> > [0] PCSetUp_GAMG(): level 0) N=8541, n data rows=1, n data cols=1,
> nnz/row (ave)=5, np=1
> > [0] PCGAMGFilterGraph():       99.9114% nnz after filtering, with
> threshold 0., 5.42079 nnz ave. (N=8541)
> > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
> > [0] PCGAMGProlongator_AGG(): New grid 1541 nodes
> > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.587728e+00
> min=2.394056e-02 PC=jacobi
> > [0] PCSetUp_GAMG(): 1) N=1541, n data cols=1, nnz/row (ave)=5, 1 active
> pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 5.21674 nnz ave. (N=1541)
> > [0] PCGAMGProlongator_AGG(): New grid 537 nodes
> > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.939736e+00
> min=6.783380e-02 PC=jacobi
> > [0] PCSetUp_GAMG(): 2) N=537, n data cols=1, nnz/row (ave)=9, 1 active
> pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 9.85289 nnz ave. (N=537)
> > [0] PCGAMGProlongator_AGG(): New grid 100 nodes
> > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.521731e+00
> min=5.974776e-02 PC=jacobi
> > [0] PCSetUp_GAMG(): 3) N=100, n data cols=1, nnz/row (ave)=12, 1 active
> pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 12.4 nnz ave. (N=100)
> > [0] PCGAMGProlongator_AGG(): New grid 17 nodes
> > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.560264e+00
> min=4.842076e-01 PC=jacobi
> > [0] PCSetUp_GAMG(): 4) N=17, n data cols=1, nnz/row (ave)=9, 1 active pes
> > [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.31821
> >
> > And the one with pc_gamg_agg_nsmooths 0
> >
> > [0] PCSetUp_GAMG(): level 0) N=8541, n data rows=1, n data cols=1,
> nnz/row (ave)=5, np=1
> > [0] PCGAMGFilterGraph():       99.9114% nnz after filtering, with
> threshold 0., 5.42079 nnz ave. (N=8541)
> > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
> > [0] PCGAMGProlongator_AGG(): New grid 1541 nodes
> > [0] PCSetUp_GAMG(): 1) N=1541, n data cols=1, nnz/row (ave)=3, 1 active
> pes
> > [0] PCGAMGFilterGraph():       99.7467% nnz after filtering, with
> threshold 0., 3.07398 nnz ave. (N=1541)
> > [0] PCGAMGProlongator_AGG(): New grid 814 nodes
> > [0] PCSetUp_GAMG(): 2) N=814, n data cols=1, nnz/row (ave)=3, 1 active
> pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 3.02211 nnz ave. (N=814)
> > [0] PCGAMGProlongator_AGG(): New grid 461 nodes
> > [0] PCSetUp_GAMG(): 3) N=461, n data cols=1, nnz/row (ave)=3, 1 active
> pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 3.00434 nnz ave. (N=461)
> > [0] PCGAMGProlongator_AGG(): New grid 290 nodes
> > [0] PCSetUp_GAMG(): 4) N=290, n data cols=1, nnz/row (ave)=3, 1 active
> pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 3. nnz ave. (N=290)
> > [0] PCGAMGProlongator_AGG(): New grid 197 nodes
> > [0] PCSetUp_GAMG(): 5) N=197, n data cols=1, nnz/row (ave)=3, 1 active
> pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 3. nnz ave. (N=197)
> > [0] PCGAMGProlongator_AGG(): New grid 127 nodes
> > [0] PCSetUp_GAMG(): 6) N=127, n data cols=1, nnz/row (ave)=2, 1 active
> pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 2.98425 nnz ave. (N=127)
> > [0] PCGAMGProlongator_AGG(): New grid 82 nodes
> > [0] PCSetUp_GAMG(): 7) N=82, n data cols=1, nnz/row (ave)=2, 1 active pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 2.97561 nnz ave. (N=82)
> > [0] PCGAMGProlongator_AGG(): New grid 66 nodes
> > [0] PCSetUp_GAMG(): 8) N=66, n data cols=1, nnz/row (ave)=2, 1 active pes
> > [0] PCGAMGFilterGraph():       100.% nnz after filtering, with threshold
> 0., 2.9697 nnz ave. (N=66)
> > [0] PCGAMGProlongator_AGG(): New grid 36 nodes
> > [0] PCSetUp_GAMG(): 9) N=36, n data cols=1, nnz/row (ave)=2, 1 active pes
> > [0] PCSetUp_GAMG(): 10 levels, grid complexity = 1.23689
> >
> >
> > Thanks,
> > Justin
> >
> > On Fri, Feb 1, 2019 at 7:13 AM Mark Adams <mfadams at lbl.gov> wrote:
> >
> >
> > Both GAMG and ILU are nice and dandy for this,
> >
> > I would test Richardson/SOR and Chebyshev/Jacobi on the tiny system and
> converge it way down, say rtol = 1.e-12. See which one is better in the
> early iteration and pick it. It would be nice to check that it solves the
> problem ...
> >
> > The residual drops about 5 orders in the first iteration and then
> flatlines. That is very bad. Check that the smoother can actually solve the
> problem.
> >
> > but as soon as I look at a bigger system, like a network with 8500
> buses, the out-of-box gamg craps out. I am not sure where to start when it
> comes to tuning GAMG.
> >
> > First try -pc_gamg_nsmooths 0
> >
> > You can run with -info and grep on GAMG to see some diagnostic output.
> Eigen estimates are fragile in practice and with this parameter and SOR
> there are no eigen estimates needed. The max eigen values for all levels
> should be between say 2 (or less) and say 3-4. Much higher is a sign of a
> problem.
> >
> > Attached is the ksp monitor/view output for gamg on the unsuccessful
> solve
> >
> > I'm also attaching a zip file which contains the simple PETSc script
> that loads the binary matrix/vector as well as two test cases, if you guys
> are interested in trying it out. It only works if you have PETSc configured
> with complex numbers.
> >
> > Thanks
> >
> > Justin
> >
> > PS - A couple years ago I had asked if there was a paper/tutorial on
> using/tuning GAMG. Does such a thing exist today?
> >
> > There is a write up in the manual that is tutorial like.
> >
> >
> > On Thu, Jan 31, 2019 at 5:00 PM Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Thu, Jan 31, 2019 at 6:22 PM Justin Chang <jychang48 at gmail.com>
> wrote:
> > Here's IMHO the simplest explanation of the equations I'm trying to
> solve:
> >
> > http://home.eng.iastate.edu/~jdm/ee458_2011/PowerFlowEquations.pdf
> >
> > Right now we're just trying to solve eq(5) (in section 1), inverting the
> linear Y-bus matrix. Eventually we have to be able to solve equations like
> those in the next section.
> >
> > Maybe I am reading this wrong, but the Y-bus matrix looks like an
> M-matrix to me (if all the y's are positive). This means
> > that it should be really easy to solve, and I think GAMG should do it.
> You can start out just doing relaxation, like SOR, on
> > small examples.
> >
> >   Thanks,
> >
> >     Matt
> >
> > On Thu, Jan 31, 2019 at 1:47 PM Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Thu, Jan 31, 2019 at 3:20 PM Justin Chang via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> > Hi all,
> >
> > I'm working with some folks to extract a linear system of equations from
> an external software package that solves power flow equations in complex
> form. Since that external package uses serial direct solvers like KLU from
> suitesparse, I want a proof-of-concept where the same matrix can be solved
> in PETSc using its parallel solvers.
> >
> > I got mumps to achieve a very minor speedup across two MPI processes on
> a single node (went from solving a 300k dog system in 1.8 seconds to 1.5
> seconds). However I want to use iterative solvers and preconditioners but I
> have never worked with complex numbers so I am not sure what the "best"
> options are given PETSc's capabilities.
> >
> > So far I tried GMRES/BJACOBI and it craps out (unsurprisingly). I
> believe I also tried BICG with BJACOBI and while it did converge it
> converged slowly. Does anyone have recommendations on how one would go
> about preconditioning PETSc matrices with complex numbers? I was originally
> thinking about converting it to cartesian form: Declaring all voltages =
> sqrt(real^2+imaginary^2) and all angles to be something like a conditional
> arctan(imaginary/real) because all the papers I've seen in literature that
> claim to successfully precondition power flow equations operate in this
> form.
> >
> > 1) We really need to see the (simplified) equations
> >
> > 2) All complex equations can be converted to a system of real equations
> twice as large, but this is not necessarily the best way to go
> >
> >  Thanks,
> >
> >     Matt
> >
> > Justin
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190208/1daeb100/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1MPI.png
Type: image/png
Size: 25884 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190208/1daeb100/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2MPI.png
Type: image/png
Size: 19966 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190208/1daeb100/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4MPI.png
Type: image/png
Size: 15018 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190208/1daeb100/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: larger.png
Type: image/png
Size: 18636 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190208/1daeb100/attachment-0007.png>