<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Here are the two files. In this case,

      maybe you can also give me some hints, why the solver at all does

      not scale here. The solver runtime for 64 cores is 206 seconds,

      with the same problem size on 128 cores it takes 172 seconds. The

      number of inner and outer solver iterations are the same for both

      runs. I use CG with jacobi-preconditioner and hypre boomeramg for

      inner solver. <br>

      <br>

      Am 19.11.2012 13:41, schrieb Jed Brown:<br>

    </div>

    <blockquote

cite="mid:CAM9tzSmD2oyr_=ZYqroOAYFmGbcBMTkJ6ofOdxektPc1SD4QVA@mail.gmail.com"

      type="cite">Just have it do one or a few iterations.

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">On Mon, Nov 19, 2012 at 1:36 PM, Thomas

          Witkowski <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:thomas.witkowski@tu-dresden.de"

              target="_blank">thomas.witkowski@tu-dresden.de</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000">

              <div>I can do this! Should I stop the run after KSPSetUp?

                Or do you want to see the log_summary file from the

                whole run?<br>

                <br>

                Thomas<br>

                <br>

                Am 19.11.2012 13:33, schrieb Jed Brown:<br>

              </div>

              <div>

                <div class="h5">

                  <blockquote type="cite">Always, always, always send

                    -log_summary when asking about performance.

                    <div class="gmail_extra"><br>

                      <br>

                      <div class="gmail_quote">On Mon, Nov 19, 2012 at

                        11:26 AM, Thomas Witkowski <span dir="ltr"><<a

                            moz-do-not-send="true"

                            href="mailto:thomas.witkowski@tu-dresden.de"

                            target="_blank">thomas.witkowski@tu-dresden.de</a>></span>

                        wrote:<br>

                        <blockquote class="gmail_quote" style="margin:0

                          0 0 .8ex;border-left:1px #ccc

                          solid;padding-left:1ex">I have some scaling

                          problem in KSPSetUp, maybe some of you can

                          help me to fix it. It takes 4.5 seconds on 64

                          cores, and 4.0 cores on 128 cores. The matrix

                          has around 11 million rows and is not

                          perfectly balanced, but the number of maximum

                          rows per core in the 128 cases is exactly

                          halfe of the number in the case when using 64

                          cores. Besides the scaling, why does the setup

                          takes so long? I though that just some objects

                          are created but no calculation is going on!<br>

                          <br>

                          The KSPView on the corresponding solver

                          objects is as follows:<br>

                          <br>

                          KSP Object:(ns_) 64 MPI processes<br>

                            type: fgmres<br>

                              GMRES: restart=30, using Classical

                          (unmodified) Gram-Schmidt Orthogonalization

                          with no iterative refinement<br>

                              GMRES: happy breakdown tolerance 1e-30<br>

                            maximum iterations=100, initial guess is

                          zero<br>

                            tolerances:  relative=1e-06, absolute=1e-08,

                          divergence=10000<br>

                            right preconditioning<br>

                            has attached null space<br>

                            using UNPRECONDITIONED norm type for

                          convergence test<br>

                          PC Object:(ns_) 64 MPI processes<br>

                            type: fieldsplit<br>

                              FieldSplit with Schur preconditioner,

                          factorization FULL<br>

                              Preconditioner for the Schur complement

                          formed from the block diagonal part of A11<br>

                              Split info:<br>

                              Split number 0 Defined by IS<br>

                              Split number 1 Defined by IS<br>

                              KSP solver for A00 block<br>

                                KSP Object:    

                           (ns_fieldsplit_velocity_)       64 MPI

                          processes<br>

                                  type: preonly<br>

                                  maximum iterations=10000, initial

                          guess is zero<br>

                                  tolerances:  relative=1e-05,

                          absolute=1e-50, divergence=10000<br>

                                  left preconditioning<br>

                                  using DEFAULT norm type for

                          convergence test<br>

                                PC Object:    

                           (ns_fieldsplit_velocity_)       64 MPI

                          processes<br>

                                  type: none<br>

                                  linear system matrix = precond matrix:<br>

                                  Matrix Object:         64 MPI

                          processes<br>

                                    type: mpiaij<br>

                                    rows=11068107, cols=11068107<br>

                                    total: nonzeros=315206535, allocated

                          nonzeros=315206535<br>

                                    total number of mallocs used during

                          MatSetValues calls =0<br>

                                      not using I-node (on process 0)

                          routines<br>

                              KSP solver for S = A11 - A10 inv(A00) A01<br>

                                KSP Object:    

                           (ns_fieldsplit_pressure_)       64 MPI

                          processes<br>

                                  type: gmres<br>

                                    GMRES: restart=30, using Classical

                          (unmodified) Gram-Schmidt Orthogonalization

                          with no iterative refinement<br>

                                    GMRES: happy breakdown tolerance

                          1e-30<br>

                                  maximum iterations=10000, initial

                          guess is zero<br>

                                  tolerances:  relative=1e-05,

                          absolute=1e-50, divergence=10000<br>

                                  left preconditioning<br>

                                  using DEFAULT norm type for

                          convergence test<br>

                                PC Object:    

                           (ns_fieldsplit_pressure_)       64 MPI

                          processes<br>

                                  type: none<br>

                                  linear system matrix followed by

                          preconditioner matrix:<br>

                                  Matrix Object:         64 MPI

                          processes<br>

                                    type: schurcomplement<br>

                                    rows=469678, cols=469678<br>

                                      Schur complement A11 - A10

                          inv(A00) A01<br>

                                      A11<br>

                                        Matrix Object:               64

                          MPI processes<br>

                                          type: mpiaij<br>

                                          rows=469678, cols=469678<br>

                                          total: nonzeros=0, allocated

                          nonzeros=0<br>

                                          total number of mallocs used

                          during MatSetValues calls =0<br>

                                            using I-node (on process 0)

                          routines: found 1304 nodes, limit used is 5<br>

                                      A10<br>

                                        Matrix Object:               64

                          MPI processes<br>

                                          type: mpiaij<br>

                                          rows=469678, cols=11068107<br>

                                          total: nonzeros=89122957,

                          allocated nonzeros=89122957<br>

                                          total number of mallocs used

                          during MatSetValues calls =0<br>

                                            not using I-node (on process

                          0) routines<br>

                                      KSP of A00<br>

                                        KSP Object:

                          (ns_fieldsplit_velocity_)               64 MPI

                          processes<br>

                                          type: preonly<br>

                                          maximum iterations=10000,

                          initial guess is zero<br>

                                          tolerances:  relative=1e-05,

                          absolute=1e-50, divergence=10000<br>

                                          left preconditioning<br>

                                          using DEFAULT norm type for

                          convergence test<br>

                                        PC Object:

                          (ns_fieldsplit_velocity_)               64 MPI

                          processes<br>

                                          type: none<br>

                                          linear system matrix = precond

                          matrix:<br>

                                          Matrix Object:                

                          64 MPI processes<br>

                                            type: mpiaij<br>

                                            rows=11068107, cols=11068107<br>

                                            total: nonzeros=315206535,

                          allocated nonzeros=315206535<br>

                                            total number of mallocs used

                          during MatSetValues calls =0<br>

                                              not using I-node (on

                          process 0) routines<br>

                                      A01<br>

                                        Matrix Object:               64

                          MPI processes<br>

                                          type: mpiaij<br>

                                          rows=11068107, cols=469678<br>

                                          total: nonzeros=88821041,

                          allocated nonzeros=88821041<br>

                                          total number of mallocs used

                          during MatSetValues calls =0<br>

                                            not using I-node (on process

                          0) routines<br>

                                  Matrix Object:         64 MPI

                          processes<br>

                                    type: mpiaij<br>

                                    rows=469678, cols=469678<br>

                                    total: nonzeros=0, allocated

                          nonzeros=0<br>

                                    total number of mallocs used during

                          MatSetValues calls =0<br>

                                      using I-node (on process 0)

                          routines: found 1304 nodes, limit used is 5<br>

                            linear system matrix = precond matrix:<br>

                            Matrix Object:   64 MPI processes<br>

                              type: mpiaij<br>

                              rows=11537785, cols=11537785<br>

                              total: nonzeros=493150533, allocated

                          nonzeros=510309207<br>

                              total number of mallocs used during

                          MatSetValues calls =0<br>

                                not using I-node (on process 0) routines<span><font

                              color="#888888"><br>

                              <br>

                              <br>

                              <br>

                              <br>

                              Thomas<br>

                            </font></span></blockquote>

                      </div>

                      <br>

                    </div>

                  </blockquote>

                  <br>

                </div>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </body>

</html>