<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 4/10/2012 3:40 AM, Matthew Knepley
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAMYG4Gm0CmONQ_3dnP_m4wiaV=K=uO5s3BWp6sRuLrCu_7ScfA@mail.gmail.com"
      type="cite">On Wed, Oct 3, 2012 at 4:05 PM, TAY wee-beng <span
        dir="ltr"><<a moz-do-not-send="true"
          href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
      wrote:<br>
      <div class="gmail_quote">
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000">
            <div>Hi Jed,<br>
              <br>
              I believe they are real cores. Anyway, I have attached the
              log summary for the 12/24/48 cores. I re-run a smaller
              case because the large problem can't run with 12cores.<br>
            </div>
          </div>
        </blockquote>
        <div><br>
        </div>
        <div>Okay, look at VecScatterBegin/End for 24 and 48 cores (I am
          guessing you have 4 16-core chips, but please figure this
          out).</div>
        <div>The messages are logged in ScatterBegin, and the time is
          logged in ScatterEnd. From 24 to 48 cores the time is cut in
          half.</div>
        <div>If you were only communicating the boundary, this is
          completely backwards, so you are communicating a fair fraction
          of ALL</div>
        <div>the values in a subdomain. Figure out why your partition is
          so screwed up and this will go away.</div>
      </div>
    </blockquote>
    <br>
    What do you mean by "If you were only communicating the boundary,
    this is completely backwards, so you are communicating a fair
    fraction of ALL the values in a subdomain"?<br>
    <br>
    I partition my domain in the z direction, as shown in the attached
    pic. The circled region is where the airfoils are. I'm using an
    immersed boundary method (IBM) code so the grid is all Cartesian.<br>
    <br>
    I created my Z matrix using:<br>
    <br>
    call
MatCreateAIJ(MPI_COMM_WORLD,ijk_end-ijk_sta,ijk_end-ijk_sta,PETSC_DECIDE,PETSC_DECIDE,7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,A_semi_z,ierr)<br>
    <br>
    where ijk_sta / ijk_end are the starting/ending global indices of
    the row.<br>
    <br>
    7 is because the star-stencil is used in 3D.<br>
    <br>
    I create my RHS vector using:<br>
    <br>
    <i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_end-ijk_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
    <br>
    <div>The values for the matrix and vector were calculated before
      PETSc logging so they don't come into play.<br>
      <br>
      They are also done in a similar fashion for matrix x and y. I
      still can't get it why solving the z momentum eqn takes so much
      time. Which portion should I focus on?<br>
      <br>
      Tks!<br>
      <br>
    </div>
    <blockquote
cite="mid:CAMYG4Gm0CmONQ_3dnP_m4wiaV=K=uO5s3BWp6sRuLrCu_7ScfA@mail.gmail.com"
      type="cite">
      <div class="gmail_quote">
        <div><br>
        </div>
        <div>   Matt</div>
        <div> </div>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000">
            <div>
              <pre cols="72">Yours sincerely,

TAY wee-beng</pre>
              <div>
                <div class="h5"> On 3/10/2012 5:59 PM, Jed Brown wrote:<br>
                </div>
              </div>
            </div>
            <div>
              <div class="h5">
                <blockquote type="cite">There is an inordinate amount of
                  time being spent in VecScatterEnd(). That sometimes
                  indicates a very bad partition. Also, are your "48
                  cores" real physical cores or just "logical cores"
                  (look like cores to the operating system, usually
                  advertised as "threads" by the vendor, nothing like
                  cores in reality)? That can cause a huge load
                  imbalance and very confusing results as
                  over-subscribed threads compete for shared resources.
                  Step it back to 24 threads and 12 threads, send
                  log_summary for each.<br>
                  <br>
                  <div class="gmail_quote">On Wed, Oct 3, 2012 at 8:08
                    AM, TAY wee-beng <span dir="ltr"><<a
                        moz-do-not-send="true"
                        href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
                    wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div bgcolor="#FFFFFF" text="#000000">
                        <div>
                          <div>On 2/10/2012 2:43 PM, Jed Brown wrote:<br>
                          </div>
                          <blockquote type="cite">On Tue, Oct 2, 2012 at
                            8:35 AM, TAY wee-beng <span dir="ltr"><<a
                                moz-do-not-send="true"
                                href="mailto:zonexo@gmail.com"
                                target="_blank">zonexo@gmail.com</a>></span>
                            wrote:<br>
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                <div bgcolor="#FFFFFF" text="#000000">
                                  <div>Hi,<br>
                                    <br>
                                    I have combined the momentum linear
                                    eqns involving x,y,z into 1 large
                                    matrix. The Poisson eqn is solved
                                    using HYPRE strcut format so it's
                                    not included. I run the code for 50
                                    timesteps (hence 50 kspsolve) using
                                    96 procs. The log_summary is given
                                    below. I have some questions:<br>
                                    <br>
                                    1. After combining the matrix, I
                                    should have only 1 PETSc matrix. Why
                                    does it says there are 4 matrix, 12
                                    vector etc? <br>
                                  </div>
                                </div>
                              </blockquote>
                              <div><br>
                              </div>
                              <div>They are part of preconditioning. Are
                                you sure you're using Hypre for this? It
                                looks like you are using bjacobi/ilu.</div>
                              <div> </div>
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                <div bgcolor="#FFFFFF" text="#000000">
                                  <div> <br>
                                    2. I'm looking at the stages which
                                    take the longest time. It seems that
                                    MatAssemblyBegin, VecNorm,
                                    VecAssemblyBegin, VecScatterEnd have
                                    very high ratios. The ratios of some
                                    others are also not too good (~ 1.6
                                    - 2). So are these stages the reason
                                    why my code is not scaling well?
                                    What can I do to improve it?<br>
                                  </div>
                                </div>
                              </blockquote>
                              <div><br>
                              </div>
                              <div>3/4 of the solve time is evenly
                                balanced between MatMult, MatSolve,
                                MatLUFactorNumeric, and VecNorm+VecDot.</div>
                              <div><br>
                              </div>
                              <div>The high VecAssembly time might be
                                due to generating a lot of entries
                                off-process?</div>
                              <div><br>
                              </div>
                              <div>In any case, this looks like an
                                _extremely_ slow network, perhaps it's
                                misconfigured?</div>
                            </div>
                          </blockquote>
                          <br>
                        </div>
                        My cluster is configured with 48 procs per node.
                        I re-run the case, using only 48 procs, thus
                        there's no need to pass over a 'slow'
                        interconnect. I'm now also using GAMG and BCGS
                        for the poisson and momentum eqn respectively. I
                        have also separated the x,y,z component of the
                        momentum eqn to 3 separate linear eqns to debug
                        the problem. <br>
                        <br>
                        Results show that stage "momentum_z" is taking a
                        lot of time. I wonder if it has to do with the
                        fact that I am partitioning my grids in the z
                        direction. VecScatterEnd, MatMult are taking a
                        lot of time. VecNormalize, VecScatterEnd,
                        VecNorm, VecAssemblyBegin 's ratio are also not
                        good.<br>
                        <br>
                        I wonder why a lot of entries are generated
                        off-process.<br>
                        <br>
                        I create my RHS vector using:<br>
                        <br>
                        <i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
                        <br>
                        where ijk_xyz_sta and ijk_xyz_end are obtained
                        from<br>
                        <br>
                        <i>call
                          MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>
                        <br>
                        I then insert the values into the vector using:<br>
                        <br>
                        <i>call VecSetValues(b_rhs_semi_z , ijk_xyz_end
                          - ijk_xyz_sta , (/ijk_xyz_sta : ijk_xyz_end -
                          1/) , q_semi_vect_z(ijk_xyz_sta + 1 :
                          ijk_xyz_end) , INSERT_VALUES , ierr)</i><br>
                        <br>
                        What should I do to correct the problem?<br>
                        <br>
                        Thanks
                        <div>
                          <div><br>
                            <br>
                            <blockquote type="cite">
                              <div class="gmail_quote">
                                <div> </div>
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  <div bgcolor="#FFFFFF" text="#000000">
                                    <div> <br>
                                      Btw, I insert matrix using:<br>
                                      <br>
                                      <i>do
                                        ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>
                                      </i><i><br>
                                      </i><i>    II = ijk - 1</i><i>   
                                        !Fortran shift to 0-based</i><i><br>
                                      </i><i>    </i><i><br>
                                      </i><i>    call
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>
                                      </i><i><br>
                                      </i><i>end do</i><br>
                                      <br>
                                      where ijk_xyz_sta/ijk_xyz_end are
                                      the starting/end index<br>
                                      <br>
                                      int_semi_xyz(ijk,1:7) stores the 7
                                      column global indices<br>
                                      <br>
                                      semi_mat_xyz has the corresponding
                                      values.<br>
                                      <br>
                                      and I insert vectors using:<br>
                                      <br>
                                      call
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>
                                      <br>
                                      Thanks!<br>
                                      <br>
                                      <i><br>
                                      </i><br>
                                      <pre cols="72">Yours sincerely,

TAY wee-beng</pre>
                                      <div>
                                        <div> On 30/9/2012 11:30 PM, Jed
                                          Brown wrote:<br>
                                        </div>
                                      </div>
                                    </div>
                                    <div>
                                      <div>
                                        <blockquote type="cite">
                                          <p>You can measure the time
                                            spent in Hypre via PCApply
                                            and PCSetUp, but you can't
                                            get finer grained integrated
                                            profiling because it was not
                                            set up that way.</p>
                                          <div class="gmail_quote">On
                                            Sep 30, 2012 3:26 PM, "TAY
                                            wee-beng" <<a
                                              moz-do-not-send="true"
                                              href="mailto:zonexo@gmail.com"
                                              target="_blank">zonexo@gmail.com</a>>



                                            wrote:<br type="attribution">
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              <div bgcolor="#FFFFFF"
                                                text="#000000">
                                                <div>On 27/9/2012 1:44
                                                  PM, Matthew Knepley
                                                  wrote:<br>
                                                </div>
                                                <blockquote type="cite">On
                                                  Thu, Sep 27, 2012 at
                                                  3:49 AM, TAY wee-beng
                                                  <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
                                                  wrote:<br>
                                                  <div
                                                    class="gmail_quote">
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0 0
                                                      0
                                                      .8ex;border-left:1px
                                                      #ccc
                                                      solid;padding-left:1ex">
                                                      Hi,<br>
                                                      <br>
                                                      I'm doing a log
                                                      summary for my 3d
                                                      cfd code. I have
                                                      some questions:<br>
                                                      <br>
                                                      1. if I'm solving
                                                      3 linear equations
                                                      using ksp, is the
                                                      result given in
                                                      the log summary
                                                      the total of the 3
                                                      linear eqns'
                                                      performance? How
                                                      can I get the
                                                      performance for
                                                      each individual
                                                      eqn?<br>
                                                    </blockquote>
                                                    <div><br>
                                                    </div>
                                                    <div>Use logging
                                                      stages: <a
                                                        moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html"
                                                        target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>
                                                    <div> </div>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0 0
                                                      0
                                                      .8ex;border-left:1px
                                                      #ccc
                                                      solid;padding-left:1ex">
                                                      2. If I run my
                                                      code for 10 time
                                                      steps, does the
                                                      log summary gives
                                                      the total or avg
                                                      performance/ratio?<br>
                                                    </blockquote>
                                                    <div><br>
                                                    </div>
                                                    <div>Total.</div>
                                                    <div> </div>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0 0
                                                      0
                                                      .8ex;border-left:1px
                                                      #ccc
                                                      solid;padding-left:1ex">
                                                      3. Besides PETSc,
                                                      I'm also using
                                                      HYPRE's native
                                                      geometric MG
                                                      (Struct) to solve
                                                      my Cartesian's
                                                      grid CFD poisson
                                                      eqn. Is there any
                                                      way I can use
                                                      PETSc's log
                                                      summary to get
                                                      HYPRE's
                                                      performance? If I
                                                      use boomerAMG thru
                                                      PETSc, can I get
                                                      its performance?</blockquote>
                                                    <div><br>
                                                    </div>
                                                    <div>If you mean
                                                      flops, only if you
                                                      count them
                                                      yourself and tell
                                                      PETSc using <a
                                                        moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html"
                                                        target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>
                                                    <div><br>
                                                    </div>
                                                    <div>This is the
                                                      disadvantage of
                                                      using packages
                                                      that do not
                                                      properly monitor
                                                      things :)</div>
                                                    <div><br>
                                                    </div>
                                                    <div>    Matt</div>
                                                    <div> </div>
                                                  </div>
                                                </blockquote>
                                                So u mean if I use
                                                boomerAMG thru PETSc,
                                                there is no proper way
                                                of evaluating its
                                                performance, beside
                                                using PetscLogFlops?<br>
                                                <blockquote type="cite">
                                                  <div
                                                    class="gmail_quote">
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0 0
                                                      0
                                                      .8ex;border-left:1px
                                                      #ccc
                                                      solid;padding-left:1ex">
                                                      <span><font
                                                          color="#888888"><br>
                                                          -- <br>
                                                          Yours
                                                          sincerely,<br>
                                                          <br>
                                                          TAY wee-beng<br>
                                                          <br>
                                                        </font></span></blockquote>
                                                  </div>
                                                  <br>
                                                  <br clear="all">
                                                  <div><br>
                                                  </div>
                                                  -- <br>
                                                  What most
                                                  experimenters take for
                                                  granted before they
                                                  begin their
                                                  experiments is
                                                  infinitely more
                                                  interesting than any
                                                  results to which their
                                                  experiments lead.<br>
                                                  -- Norbert Wiener<br>
                                                </blockquote>
                                                <br>
                                              </div>
                                            </blockquote>
                                          </div>
                                        </blockquote>
                                        <br>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </div>
                              <br>
                            </blockquote>
                            <br>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </blockquote>
                <br>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
      <br clear="all">
      <div><br>
      </div>
      -- <br>
      What most experimenters take for granted before they begin their
      experiments is infinitely more interesting than any results to
      which their experiments lead.<br>
      -- Norbert Wiener<br>
    </blockquote>
    <br>
  </body>
</html>