<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Here are the two files. In this case,
      maybe you can also give me some hints, why the solver at all does
      not scale here. The solver runtime for 64 cores is 206 seconds,
      with the same problem size on 128 cores it takes 172 seconds. The
      number of inner and outer solver iterations are the same for both
      runs. I use CG with jacobi-preconditioner and hypre boomeramg for
      inner solver. <br>
      <br>
      Am 19.11.2012 13:41, schrieb Jed Brown:<br>
    </div>
    <blockquote
cite="mid:CAM9tzSmD2oyr_=ZYqroOAYFmGbcBMTkJ6ofOdxektPc1SD4QVA@mail.gmail.com"
      type="cite">Just have it do one or a few iterations.
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Mon, Nov 19, 2012 at 1:36 PM, Thomas
          Witkowski <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:thomas.witkowski@tu-dresden.de"
              target="_blank">thomas.witkowski@tu-dresden.de</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000">
              <div>I can do this! Should I stop the run after KSPSetUp?
                Or do you want to see the log_summary file from the
                whole run?<br>
                <br>
                Thomas<br>
                <br>
                Am 19.11.2012 13:33, schrieb Jed Brown:<br>
              </div>
              <div>
                <div class="h5">
                  <blockquote type="cite">Always, always, always send
                    -log_summary when asking about performance.
                    <div class="gmail_extra"><br>
                      <br>
                      <div class="gmail_quote">On Mon, Nov 19, 2012 at
                        11:26 AM, Thomas Witkowski <span dir="ltr"><<a
                            moz-do-not-send="true"
                            href="mailto:thomas.witkowski@tu-dresden.de"
                            target="_blank">thomas.witkowski@tu-dresden.de</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">I have some scaling
                          problem in KSPSetUp, maybe some of you can
                          help me to fix it. It takes 4.5 seconds on 64
                          cores, and 4.0 cores on 128 cores. The matrix
                          has around 11 million rows and is not
                          perfectly balanced, but the number of maximum
                          rows per core in the 128 cases is exactly
                          halfe of the number in the case when using 64
                          cores. Besides the scaling, why does the setup
                          takes so long? I though that just some objects
                          are created but no calculation is going on!<br>
                          <br>
                          The KSPView on the corresponding solver
                          objects is as follows:<br>
                          <br>
                          KSP Object:(ns_) 64 MPI processes<br>
                            type: fgmres<br>
                              GMRES: restart=30, using Classical
                          (unmodified) Gram-Schmidt Orthogonalization
                          with no iterative refinement<br>
                              GMRES: happy breakdown tolerance 1e-30<br>
                            maximum iterations=100, initial guess is
                          zero<br>
                            tolerances:  relative=1e-06, absolute=1e-08,
                          divergence=10000<br>
                            right preconditioning<br>
                            has attached null space<br>
                            using UNPRECONDITIONED norm type for
                          convergence test<br>
                          PC Object:(ns_) 64 MPI processes<br>
                            type: fieldsplit<br>
                              FieldSplit with Schur preconditioner,
                          factorization FULL<br>
                              Preconditioner for the Schur complement
                          formed from the block diagonal part of A11<br>
                              Split info:<br>
                              Split number 0 Defined by IS<br>
                              Split number 1 Defined by IS<br>
                              KSP solver for A00 block<br>
                                KSP Object:    
                           (ns_fieldsplit_velocity_)       64 MPI
                          processes<br>
                                  type: preonly<br>
                                  maximum iterations=10000, initial
                          guess is zero<br>
                                  tolerances:  relative=1e-05,
                          absolute=1e-50, divergence=10000<br>
                                  left preconditioning<br>
                                  using DEFAULT norm type for
                          convergence test<br>
                                PC Object:    
                           (ns_fieldsplit_velocity_)       64 MPI
                          processes<br>
                                  type: none<br>
                                  linear system matrix = precond matrix:<br>
                                  Matrix Object:         64 MPI
                          processes<br>
                                    type: mpiaij<br>
                                    rows=11068107, cols=11068107<br>
                                    total: nonzeros=315206535, allocated
                          nonzeros=315206535<br>
                                    total number of mallocs used during
                          MatSetValues calls =0<br>
                                      not using I-node (on process 0)
                          routines<br>
                              KSP solver for S = A11 - A10 inv(A00) A01<br>
                                KSP Object:    
                           (ns_fieldsplit_pressure_)       64 MPI
                          processes<br>
                                  type: gmres<br>
                                    GMRES: restart=30, using Classical
                          (unmodified) Gram-Schmidt Orthogonalization
                          with no iterative refinement<br>
                                    GMRES: happy breakdown tolerance
                          1e-30<br>
                                  maximum iterations=10000, initial
                          guess is zero<br>
                                  tolerances:  relative=1e-05,
                          absolute=1e-50, divergence=10000<br>
                                  left preconditioning<br>
                                  using DEFAULT norm type for
                          convergence test<br>
                                PC Object:    
                           (ns_fieldsplit_pressure_)       64 MPI
                          processes<br>
                                  type: none<br>
                                  linear system matrix followed by
                          preconditioner matrix:<br>
                                  Matrix Object:         64 MPI
                          processes<br>
                                    type: schurcomplement<br>
                                    rows=469678, cols=469678<br>
                                      Schur complement A11 - A10
                          inv(A00) A01<br>
                                      A11<br>
                                        Matrix Object:               64
                          MPI processes<br>
                                          type: mpiaij<br>
                                          rows=469678, cols=469678<br>
                                          total: nonzeros=0, allocated
                          nonzeros=0<br>
                                          total number of mallocs used
                          during MatSetValues calls =0<br>
                                            using I-node (on process 0)
                          routines: found 1304 nodes, limit used is 5<br>
                                      A10<br>
                                        Matrix Object:               64
                          MPI processes<br>
                                          type: mpiaij<br>
                                          rows=469678, cols=11068107<br>
                                          total: nonzeros=89122957,
                          allocated nonzeros=89122957<br>
                                          total number of mallocs used
                          during MatSetValues calls =0<br>
                                            not using I-node (on process
                          0) routines<br>
                                      KSP of A00<br>
                                        KSP Object:
                          (ns_fieldsplit_velocity_)               64 MPI
                          processes<br>
                                          type: preonly<br>
                                          maximum iterations=10000,
                          initial guess is zero<br>
                                          tolerances:  relative=1e-05,
                          absolute=1e-50, divergence=10000<br>
                                          left preconditioning<br>
                                          using DEFAULT norm type for
                          convergence test<br>
                                        PC Object:
                          (ns_fieldsplit_velocity_)               64 MPI
                          processes<br>
                                          type: none<br>
                                          linear system matrix = precond
                          matrix:<br>
                                          Matrix Object:                
                          64 MPI processes<br>
                                            type: mpiaij<br>
                                            rows=11068107, cols=11068107<br>
                                            total: nonzeros=315206535,
                          allocated nonzeros=315206535<br>
                                            total number of mallocs used
                          during MatSetValues calls =0<br>
                                              not using I-node (on
                          process 0) routines<br>
                                      A01<br>
                                        Matrix Object:               64
                          MPI processes<br>
                                          type: mpiaij<br>
                                          rows=11068107, cols=469678<br>
                                          total: nonzeros=88821041,
                          allocated nonzeros=88821041<br>
                                          total number of mallocs used
                          during MatSetValues calls =0<br>
                                            not using I-node (on process
                          0) routines<br>
                                  Matrix Object:         64 MPI
                          processes<br>
                                    type: mpiaij<br>
                                    rows=469678, cols=469678<br>
                                    total: nonzeros=0, allocated
                          nonzeros=0<br>
                                    total number of mallocs used during
                          MatSetValues calls =0<br>
                                      using I-node (on process 0)
                          routines: found 1304 nodes, limit used is 5<br>
                            linear system matrix = precond matrix:<br>
                            Matrix Object:   64 MPI processes<br>
                              type: mpiaij<br>
                              rows=11537785, cols=11537785<br>
                              total: nonzeros=493150533, allocated
                          nonzeros=510309207<br>
                              total number of mallocs used during
                          MatSetValues calls =0<br>
                                not using I-node (on process 0) routines<span><font
                              color="#888888"><br>
                              <br>
                              <br>
                              <br>
                              <br>
                              Thomas<br>
                            </font></span></blockquote>
                      </div>
                      <br>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>