Just have it do one or a few iterations.<div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Nov 19, 2012 at 1:36 PM, Thomas Witkowski <span dir="ltr"><<a href="mailto:thomas.witkowski@tu-dresden.de" target="_blank">thomas.witkowski@tu-dresden.de</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <div>I can do this! Should I stop the run

      after KSPSetUp? Or do you want to see the log_summary file from

      the whole run?<br>

      <br>

      Thomas<br>

      <br>

      Am 19.11.2012 13:33, schrieb Jed Brown:<br>

    </div><div><div class="h5">

    <blockquote type="cite">Always, always, always send -log_summary when asking

      about performance.

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">On Mon, Nov 19, 2012 at 11:26 AM,

          Thomas Witkowski <span dir="ltr"><<a href="mailto:thomas.witkowski@tu-dresden.de" target="_blank">thomas.witkowski@tu-dresden.de</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I have

            some scaling problem in KSPSetUp, maybe some of you can help

            me to fix it. It takes 4.5 seconds on 64 cores, and 4.0

            cores on 128 cores. The matrix has around 11 million rows

            and is not perfectly balanced, but the number of maximum

            rows per core in the 128 cases is exactly halfe of the

            number in the case when using 64 cores. Besides the scaling,

            why does the setup takes so long? I though that just some

            objects are created but no calculation is going on!<br>

            <br>

            The KSPView on the corresponding solver objects is as

            follows:<br>

            <br>

            KSP Object:(ns_) 64 MPI processes<br>

              type: fgmres<br>

                GMRES: restart=30, using Classical (unmodified)

            Gram-Schmidt Orthogonalization with no iterative refinement<br>

                GMRES: happy breakdown tolerance 1e-30<br>

              maximum iterations=100, initial guess is zero<br>

              tolerances:  relative=1e-06, absolute=1e-08,

            divergence=10000<br>

              right preconditioning<br>

              has attached null space<br>

              using UNPRECONDITIONED norm type for convergence test<br>

            PC Object:(ns_) 64 MPI processes<br>

              type: fieldsplit<br>

                FieldSplit with Schur preconditioner, factorization FULL<br>

                Preconditioner for the Schur complement formed from the

            block diagonal part of A11<br>

                Split info:<br>

                Split number 0 Defined by IS<br>

                Split number 1 Defined by IS<br>

                KSP solver for A00 block<br>

                  KSP Object:      (ns_fieldsplit_velocity_)       64

            MPI processes<br>

                    type: preonly<br>

                    maximum iterations=10000, initial guess is zero<br>

                    tolerances:  relative=1e-05, absolute=1e-50,

            divergence=10000<br>

                    left preconditioning<br>

                    using DEFAULT norm type for convergence test<br>

                  PC Object:      (ns_fieldsplit_velocity_)       64 MPI

            processes<br>

                    type: none<br>

                    linear system matrix = precond matrix:<br>

                    Matrix Object:         64 MPI processes<br>

                      type: mpiaij<br>

                      rows=11068107, cols=11068107<br>

                      total: nonzeros=315206535, allocated

            nonzeros=315206535<br>

                      total number of mallocs used during MatSetValues

            calls =0<br>

                        not using I-node (on process 0) routines<br>

                KSP solver for S = A11 - A10 inv(A00) A01<br>

                  KSP Object:      (ns_fieldsplit_pressure_)       64

            MPI processes<br>

                    type: gmres<br>

                      GMRES: restart=30, using Classical (unmodified)

            Gram-Schmidt Orthogonalization with no iterative refinement<br>

                      GMRES: happy breakdown tolerance 1e-30<br>

                    maximum iterations=10000, initial guess is zero<br>

                    tolerances:  relative=1e-05, absolute=1e-50,

            divergence=10000<br>

                    left preconditioning<br>

                    using DEFAULT norm type for convergence test<br>

                  PC Object:      (ns_fieldsplit_pressure_)       64 MPI

            processes<br>

                    type: none<br>

                    linear system matrix followed by preconditioner

            matrix:<br>

                    Matrix Object:         64 MPI processes<br>

                      type: schurcomplement<br>

                      rows=469678, cols=469678<br>

                        Schur complement A11 - A10 inv(A00) A01<br>

                        A11<br>

                          Matrix Object:               64 MPI processes<br>

                            type: mpiaij<br>

                            rows=469678, cols=469678<br>

                            total: nonzeros=0, allocated nonzeros=0<br>

                            total number of mallocs used during

            MatSetValues calls =0<br>

                              using I-node (on process 0) routines:

            found 1304 nodes, limit used is 5<br>

                        A10<br>

                          Matrix Object:               64 MPI processes<br>

                            type: mpiaij<br>

                            rows=469678, cols=11068107<br>

                            total: nonzeros=89122957, allocated

            nonzeros=89122957<br>

                            total number of mallocs used during

            MatSetValues calls =0<br>

                              not using I-node (on process 0) routines<br>

                        KSP of A00<br>

                          KSP Object: (ns_fieldsplit_velocity_)        

                  64 MPI processes<br>

                            type: preonly<br>

                            maximum iterations=10000, initial guess is

            zero<br>

                            tolerances:  relative=1e-05, absolute=1e-50,

            divergence=10000<br>

                            left preconditioning<br>

                            using DEFAULT norm type for convergence test<br>

                          PC Object: (ns_fieldsplit_velocity_)          

                64 MPI processes<br>

                            type: none<br>

                            linear system matrix = precond matrix:<br>

                            Matrix Object:                 64 MPI

            processes<br>

                              type: mpiaij<br>

                              rows=11068107, cols=11068107<br>

                              total: nonzeros=315206535, allocated

            nonzeros=315206535<br>

                              total number of mallocs used during

            MatSetValues calls =0<br>

                                not using I-node (on process 0) routines<br>

                        A01<br>

                          Matrix Object:               64 MPI processes<br>

                            type: mpiaij<br>

                            rows=11068107, cols=469678<br>

                            total: nonzeros=88821041, allocated

            nonzeros=88821041<br>

                            total number of mallocs used during

            MatSetValues calls =0<br>

                              not using I-node (on process 0) routines<br>

                    Matrix Object:         64 MPI processes<br>

                      type: mpiaij<br>

                      rows=469678, cols=469678<br>

                      total: nonzeros=0, allocated nonzeros=0<br>

                      total number of mallocs used during MatSetValues

            calls =0<br>

                        using I-node (on process 0) routines: found 1304

            nodes, limit used is 5<br>

              linear system matrix = precond matrix:<br>

              Matrix Object:   64 MPI processes<br>

                type: mpiaij<br>

                rows=11537785, cols=11537785<br>

                total: nonzeros=493150533, allocated nonzeros=510309207<br>

                total number of mallocs used during MatSetValues calls

            =0<br>

                  not using I-node (on process 0) routines<span><font color="#888888"><br>

                <br>

                <br>

                <br>

                <br>

                Thomas<br>

              </font></span></blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br></div>