<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <font face="Trebuchet MS">Thank you Hong!<br>

      <br>

      I've used GMRES via<br>

      <br>

      mpirun \<br>

        -n ${NP} pflotran \<br>

        -pflotranin ${INPUTFILE}.pflinput \<br>

        -flow_ksp_type gmres \<br>

        -flow_pc_type bjacobi \<br>

        -flow_sub_pc_type lu \<br>

        -flow_sub_pc_factor_nonzeros_along_diagonal \<br>

        -snes_monitor <br>

      <br>

      and get:<br>

      <br>

      NP 1<br>

      <br>

      FLOW TS BE steps =     43 newton =       43 linear =         43

      cuts =      0<br>

      FLOW TS BE Wasted Linear Iterations = 0<br>

      FLOW TS BE SNES time = 197.0 seconds<br>

      <br>

      NP 2<br>

      <br>

      FLOW TS BE steps =     43 newton =       43 linear =        770

      cuts =      0<br>

      FLOW TS BE Wasted Linear Iterations = 0<br>

      FLOW TS BE SNES time = 68.7 seconds<br>

      <br>

      Which looks ok to me.<br>

      <br>

      Robert<br>

      <br>

      <br>

    </font><br>

    <div class="moz-cite-prefix">On 07/07/17 15:49, <a class="moz-txt-link-abbreviated" href="mailto:hong@aspiritech.org">hong@aspiritech.org</a>

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAGCphBvUTwvC87yE1F+iQ07+9gvsCeGszY0hi3q6kqtsYPpVpA@mail.gmail.com">

      <div dir="ltr">What do you get with '-ksp_type gmres' or

        '-ksp_type bcgs' in parallel runs?

        <div>Hong<br>

          <div class="gmail_extra"><br>

            <div class="gmail_quote">On Fri, Jul 7, 2017 at 6:05 AM,

              Robert Annewandter <span dir="ltr"><<a

                  href="mailto:robert.annewandter@opengosim.com"

                  target="_blank" moz-do-not-send="true">robert.annewandter@opengosim.com</a>></span>

              wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#330033" bgcolor="#FFFFFF"> <font

                    face="Trebuchet MS">Yes indeed, PFLOTRAN cuts

                    timestep after 8 failed iterations of SNES. <br>

                    <br>

                    I've rerun with -snes_monitor (attached with

                    canonical suffix), their -pc_type is always

                    PCBJACOBI + PCLU (though we'd like to try SUPERLU in

                    the future, however it works only with -mat_type

                    aij..)<br>

                    <br>

                    <br>

                    The sequential and parallel runs I did  with<br>

                     <br>

                        -ksp_type preonly -pc_type lu

                    -pc_factor_nonzeros_along_<wbr>diagonal

                    -snes_monitor<br>

                    <br>

                    and <br>

                    <br>

                        -ksp_type preonly -pc_type bjacobi -sub_pc_type

                    lu -sub_pc_factor_nonzeros_along_<wbr>diagonal

                    -snes_monitor<br>

                    <br>

                    As expected, the sequential are bot identical and

                    the parallel takes half the time compared to

                    sequential.<br>

                    <br>

                    <br>

                  </font>

                  <div>

                    <div class="h5"><br>

                      <br>

                      <div class="m_-3461912095200210970moz-cite-prefix">On

                        07/07/17 01:20, Barry Smith wrote:<br>

                      </div>

                      <blockquote type="cite">

                        <pre>   Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and cuts the timestep if that fails.

   Please run with -snes_monitor I don't understand the strange densely packed information that PFLOTRAN is printing.

   It looks like the linear solver is converging fine in parallel, normally then there is absolutely no reason that the Newton should behave different on 2 processors than 1 unless there is something wrong with the Jacobian. What is the -pc_type for the two cases LU or your fancy thing? 

   Please run sequential and parallel with -pc_type lu and also with -snes_monitor.  We need to fix all the knobs but one in order to understand what is going on.

   Barry

</pre>

                        <blockquote type="cite">

                          <pre>On Jul 6, 2017, at 5:11 PM, Robert Annewandter <a class="m_-3461912095200210970moz-txt-link-rfc2396E" href="mailto:robert.annewandter@opengosim.com" target="_blank" moz-do-not-send="true"><robert.annewandter@opengosim.<wbr>com></a> wrote:

Thanks Barry!

I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES time: 600 s). PFLOTRAN final output:

NP 1

FLOW TS BE steps =     43 newton =       43 linear =         43 cuts =      0

FLOW TS BE Wasted Linear Iterations = 0

FLOW TS BE SNES time = 218.9 seconds

NP 2

FLOW TS BE steps =     67 newton =      176 linear =        314 cuts =     13

FLOW TS BE Wasted Linear Iterations = 208

FLOW TS BE SNES time = 600.0 seconds

Robert

On 06/07/17 21:24, Barry Smith wrote:

</pre>

                          <blockquote type="cite">

                            <pre>   So on one process the outer linear solver takes a single iteration this is because the block Jacobi with LU and one block is a direct solver.

</pre>

                            <blockquote type="cite">

                              <pre>    11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05

[0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.148515820410e-14 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 1.581814306485e-02 at iteration 1

    1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12

</pre>

                            </blockquote>

                            <pre>   On two processes the outer linear solver takes a few iterations to solver, this is to be expected. 

   But what you sent doesn't give any indication about SNES not converging. Please turn off all inner linear solver monitoring and just run with -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor -snes_converged_reason

   Barry

</pre>

                            <blockquote type="cite">

                              <pre>On Jul 6, 2017, at 2:03 PM, Robert Annewandter <a class="m_-3461912095200210970moz-txt-link-rfc2396E" href="mailto:robert.annewandter@opengosim.com" target="_blank" moz-do-not-send="true"><robert.annewandter@opengosim.<wbr>com></a>

 wrote:

Hi all,

I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner (with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES + BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles to converge when using two cores instead of one. Because of the adaptive time stepping of the Newton, this leads to severe cuts in time step.

This is how I run it with two cores

mpirun \

  -n 2 pflotran \

  -pflotranin het.pflinput \

  -ksp_monitor_true_residual \

  -flow_snes_view \

  -flow_snes_converged_reason \

  -flow_sub_1_pc_type bjacobi \

  -flow_sub_1_sub_pc_type lu \

  -flow_sub_1_sub_pc_factor_<wbr>pivot_in_blocks true\

  -flow_sub_1_sub_pc_factor_<wbr>nonzeros_along_diagonal \

  -options_left \

  -log_summary \

  -info 

With one core I get (after grepping the crap away from -info):

 Step     32 Time=  1.80000E+01 

[...]

  0 2r: 1.58E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 7.18E-03 iu: 0.00E+00 rsn:   0

[0] SNESComputeJacobian(): Rebuilding preconditioner

    Residual norms for flow_ solve.

    0 KSP unpreconditioned resid norm 1.581814306485e-02 true resid norm 1.581814306485e-02 ||r(i)||/||b|| 1.000000000000e+00