<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">I can do this! Should I stop the run
      after KSPSetUp? Or do you want to see the log_summary file from
      the whole run?<br>
      <br>
      Thomas<br>
      <br>
      Am 19.11.2012 13:33, schrieb Jed Brown:<br>
    </div>
    <blockquote
cite="mid:CAM9tzSkL0N7SN-HNjpX-e_5PyqGWmcz_E2923nD0QYBTAO38cQ@mail.gmail.com"
      type="cite">Always, always, always send -log_summary when asking
      about performance.
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Mon, Nov 19, 2012 at 11:26 AM,
          Thomas Witkowski <span dir="ltr"><<a
              moz-do-not-send="true"
              href="mailto:thomas.witkowski@tu-dresden.de"
              target="_blank">thomas.witkowski@tu-dresden.de</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">I have
            some scaling problem in KSPSetUp, maybe some of you can help
            me to fix it. It takes 4.5 seconds on 64 cores, and 4.0
            cores on 128 cores. The matrix has around 11 million rows
            and is not perfectly balanced, but the number of maximum
            rows per core in the 128 cases is exactly halfe of the
            number in the case when using 64 cores. Besides the scaling,
            why does the setup takes so long? I though that just some
            objects are created but no calculation is going on!<br>
            <br>
            The KSPView on the corresponding solver objects is as
            follows:<br>
            <br>
            KSP Object:(ns_) 64 MPI processes<br>
              type: fgmres<br>
                GMRES: restart=30, using Classical (unmodified)
            Gram-Schmidt Orthogonalization with no iterative refinement<br>
                GMRES: happy breakdown tolerance 1e-30<br>
              maximum iterations=100, initial guess is zero<br>
              tolerances:  relative=1e-06, absolute=1e-08,
            divergence=10000<br>
              right preconditioning<br>
              has attached null space<br>
              using UNPRECONDITIONED norm type for convergence test<br>
            PC Object:(ns_) 64 MPI processes<br>
              type: fieldsplit<br>
                FieldSplit with Schur preconditioner, factorization FULL<br>
                Preconditioner for the Schur complement formed from the
            block diagonal part of A11<br>
                Split info:<br>
                Split number 0 Defined by IS<br>
                Split number 1 Defined by IS<br>
                KSP solver for A00 block<br>
                  KSP Object:      (ns_fieldsplit_velocity_)       64
            MPI processes<br>
                    type: preonly<br>
                    maximum iterations=10000, initial guess is zero<br>
                    tolerances:  relative=1e-05, absolute=1e-50,
            divergence=10000<br>
                    left preconditioning<br>
                    using DEFAULT norm type for convergence test<br>
                  PC Object:      (ns_fieldsplit_velocity_)       64 MPI
            processes<br>
                    type: none<br>
                    linear system matrix = precond matrix:<br>
                    Matrix Object:         64 MPI processes<br>
                      type: mpiaij<br>
                      rows=11068107, cols=11068107<br>
                      total: nonzeros=315206535, allocated
            nonzeros=315206535<br>
                      total number of mallocs used during MatSetValues
            calls =0<br>
                        not using I-node (on process 0) routines<br>
                KSP solver for S = A11 - A10 inv(A00) A01<br>
                  KSP Object:      (ns_fieldsplit_pressure_)       64
            MPI processes<br>
                    type: gmres<br>
                      GMRES: restart=30, using Classical (unmodified)
            Gram-Schmidt Orthogonalization with no iterative refinement<br>
                      GMRES: happy breakdown tolerance 1e-30<br>
                    maximum iterations=10000, initial guess is zero<br>
                    tolerances:  relative=1e-05, absolute=1e-50,
            divergence=10000<br>
                    left preconditioning<br>
                    using DEFAULT norm type for convergence test<br>
                  PC Object:      (ns_fieldsplit_pressure_)       64 MPI
            processes<br>
                    type: none<br>
                    linear system matrix followed by preconditioner
            matrix:<br>
                    Matrix Object:         64 MPI processes<br>
                      type: schurcomplement<br>
                      rows=469678, cols=469678<br>
                        Schur complement A11 - A10 inv(A00) A01<br>
                        A11<br>
                          Matrix Object:               64 MPI processes<br>
                            type: mpiaij<br>
                            rows=469678, cols=469678<br>
                            total: nonzeros=0, allocated nonzeros=0<br>
                            total number of mallocs used during
            MatSetValues calls =0<br>
                              using I-node (on process 0) routines:
            found 1304 nodes, limit used is 5<br>
                        A10<br>
                          Matrix Object:               64 MPI processes<br>
                            type: mpiaij<br>
                            rows=469678, cols=11068107<br>
                            total: nonzeros=89122957, allocated
            nonzeros=89122957<br>
                            total number of mallocs used during
            MatSetValues calls =0<br>
                              not using I-node (on process 0) routines<br>
                        KSP of A00<br>
                          KSP Object: (ns_fieldsplit_velocity_)        
                  64 MPI processes<br>
                            type: preonly<br>
                            maximum iterations=10000, initial guess is
            zero<br>
                            tolerances:  relative=1e-05, absolute=1e-50,
            divergence=10000<br>
                            left preconditioning<br>
                            using DEFAULT norm type for convergence test<br>
                          PC Object: (ns_fieldsplit_velocity_)          
                64 MPI processes<br>
                            type: none<br>
                            linear system matrix = precond matrix:<br>
                            Matrix Object:                 64 MPI
            processes<br>
                              type: mpiaij<br>
                              rows=11068107, cols=11068107<br>
                              total: nonzeros=315206535, allocated
            nonzeros=315206535<br>
                              total number of mallocs used during
            MatSetValues calls =0<br>
                                not using I-node (on process 0) routines<br>
                        A01<br>
                          Matrix Object:               64 MPI processes<br>
                            type: mpiaij<br>
                            rows=11068107, cols=469678<br>
                            total: nonzeros=88821041, allocated
            nonzeros=88821041<br>
                            total number of mallocs used during
            MatSetValues calls =0<br>
                              not using I-node (on process 0) routines<br>
                    Matrix Object:         64 MPI processes<br>
                      type: mpiaij<br>
                      rows=469678, cols=469678<br>
                      total: nonzeros=0, allocated nonzeros=0<br>
                      total number of mallocs used during MatSetValues
            calls =0<br>
                        using I-node (on process 0) routines: found 1304
            nodes, limit used is 5<br>
              linear system matrix = precond matrix:<br>
              Matrix Object:   64 MPI processes<br>
                type: mpiaij<br>
                rows=11537785, cols=11537785<br>
                total: nonzeros=493150533, allocated nonzeros=510309207<br>
                total number of mallocs used during MatSetValues calls
            =0<br>
                  not using I-node (on process 0) routines<span
              class="HOEnZb"><font color="#888888"><br>
                <br>
                <br>
                <br>
                <br>
                Thomas<br>
              </font></span></blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>