Can you send a picture of what your domain looks like and what shape the part owned by a given processor looks like? Best would be to write out the mesh with a variable marking the rank owning each vertex, then do a color plot in Paraview or whatever you use to show the partition.<div>
<br></div><div>VecScatterBegin/End is taking much more time than these, and really a pretty unreasonable amount of time in general.<br><br><div class="gmail_quote">On Thu, Oct 4, 2012 at 2:16 PM, Wee-Beng Tay <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><div class="im">
    <div>On 4/10/2012 5:11 PM, Matthew Knepley
      wrote:<br>
    </div>
    <blockquote type="cite">On Thu, Oct 4, 2012 at 11:01 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
      wrote:<br>
      <div class="gmail_quote">
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000">
            <div>On 4/10/2012 3:40 AM, Matthew Knepley wrote:<br>
            </div>
            <blockquote type="cite">On Wed, Oct 3, 2012 at 4:05 PM, TAY
              wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
              wrote:<br>
              <div class="gmail_quote">
                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div bgcolor="#FFFFFF" text="#000000">
                    <div>Hi Jed,<br>
                      <br>
                      I believe they are real cores. Anyway, I have
                      attached the log summary for the 12/24/48 cores. I
                      re-run a smaller case because the large problem
                      can't run with 12cores.<br>
                    </div>
                  </div>
                </blockquote>
                <div><br>
                </div>
                <div>Okay, look at VecScatterBegin/End for 24 and 48
                  cores (I am guessing you have 4 16-core chips, but
                  please figure this out).</div>
                <div>The messages are logged in ScatterBegin, and the
                  time is logged in ScatterEnd. From 24 to 48 cores the
                  time is cut in half.</div>
                <div>If you were only communicating the boundary, this
                  is completely backwards, so you are communicating a
                  fair fraction of ALL</div>
                <div>the values in a subdomain. Figure out why your
                  partition is so screwed up and this will go away.</div>
              </div>
            </blockquote>
            <br>
            What do you mean by "If you were only communicating the
            boundary, this is completely backwards, so you are
            communicating a fair fraction of ALL the values in a
            subdomain"?<br>
          </div>
        </blockquote>
        <div><br>
        </div>
        <div>If you have 48 partitions instead of 24, you have a larger
          interface, so AssemblyEnd() should take</div>
        <div>slightly longer. However, your AssemblyEnd() takes HALF the
          time, which means its communicating</div>
        <div>much fewer values, which means you are not sending
          interface values, you are sending interior values,</div>
        <div>since the interior shrinks when you have more partitions.</div>
        <div><br>
        </div>
        <div>What this probably means is that your assembly routines are
          screwed up, and sending data all over the place.</div>
        <div><br>
        </div>
      </div>
    </blockquote></div>
    Ok I got it now. Looking at the AssemblyEnd time,<br>
    <br>
    12 procs<br>
    <br>
    MatAssemblyEnd       145 1.0 1.6342e+01 1.8 0.00e+00 0.0 4.4e+01
    6.0e+04 8.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>
    <br>VecAssemblyEnd       388 1.0 1.4472e-03 1.4 0.00e+00 0.0 0.0e+00
    0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>
    <br>
    24 procs<br>
    <br>
    MatAssemblyEnd       145 1.0 1.1618e+01 2.4 0.00e+00 0.0 9.2e+01
    6.0e+04 8.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>
    <br>VecAssemblyEnd       388 1.0 2.3527e-03 2.4 0.00e+00 0.0 0.0e+00
    0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br><br>
    48 procs<br>
    <br>
    MatAssemblyEnd       145 1.0 7.4327e+00 2.4 0.00e+00 0.0 1.9e+02
    6.0e+04 8.0e+00  0  0  0  0  0   0  0  0  0  0     <br>
    <br><br>
    VecAssemblyEnd       388 1.0 2.8818e-03 3.7 0.00e+00 0.0 0.0e+00
    0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>
    <br>
    VecAssemblyEnd time increases with procs, does it mean that there is nothing wrong with it?<br><br>On the other hand, MatAssemblyEnd time decreases with procs. So that's where the problem lies, is that so? <br><br>

I'm still scanning my code but haven't found the error yet. It seems strange because I inserted the matrix and vector exactly the same way for x,y,z. The u,v,w are also allocated with the same indices. Shouldn't the error be the same for x, y and z too?<br>

<br>Trying to get some hints as to where else I need to look in my code...<br><br>Tks<div><div class="h5"><br><br>
    <br>
    <br>
    <br>
    <br>
    <br>
    <blockquote type="cite">
      <div class="gmail_quote">
        <div>   Matt</div>
        <div> </div>
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000"> I partition my domain
            in the z direction, as shown in the attached pic. The
            circled region is where the airfoils are. I'm using an
            immersed boundary method (IBM) code so the grid is all
            Cartesian.<br>
            <br>
            I created my Z matrix using:<br>
            <br>
            call
MatCreateAIJ(MPI_COMM_WORLD,ijk_end-ijk_sta,ijk_end-ijk_sta,PETSC_DECIDE,PETSC_DECIDE,7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,A_semi_z,ierr)<br>
            <br>
            where ijk_sta / ijk_end are the starting/ending global
            indices of the row.<br>
            <br>
            7 is because the star-stencil is used in 3D.<br>
            <br>
            I create my RHS vector using:<br>
            <br>
            <i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_end-ijk_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
            <br>
            <div>The values for the matrix and vector were calculated
              before PETSc logging so they don't come into play.<br>
              <br>
              They are also done in a similar fashion for matrix x and
              y. I still can't get it why solving the z momentum eqn
              takes so much time. Which portion should I focus on?<br>
              <br>
              Tks!<br>
              <br>
            </div>
            <blockquote type="cite">
              <div class="gmail_quote">
                <div><br>
                </div>
                <div>   Matt</div>
                <div> </div>
                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div bgcolor="#FFFFFF" text="#000000">
                    <div>
                      <pre cols="72">Yours sincerely,

TAY wee-beng</pre>
                      <div>
                        <div> On 3/10/2012 5:59 PM, Jed Brown wrote:<br>
                        </div>
                      </div>
                    </div>
                    <div>
                      <div>
                        <blockquote type="cite">There is an inordinate
                          amount of time being spent in VecScatterEnd().
                          That sometimes indicates a very bad partition.
                          Also, are your "48 cores" real physical cores
                          or just "logical cores" (look like cores to
                          the operating system, usually advertised as
                          "threads" by the vendor, nothing like cores in
                          reality)? That can cause a huge load imbalance
                          and very confusing results as over-subscribed
                          threads compete for shared resources. Step it
                          back to 24 threads and 12 threads, send
                          log_summary for each.<br>
                          <br>
                          <div class="gmail_quote">On Wed, Oct 3, 2012
                            at 8:08 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
                            wrote:<br>
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                              <div bgcolor="#FFFFFF" text="#000000">
                                <div>
                                  <div>On 2/10/2012 2:43 PM, Jed Brown
                                    wrote:<br>
                                  </div>
                                  <blockquote type="cite">On Tue, Oct 2,
                                    2012 at 8:35 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
                                    wrote:<br>
                                    <div class="gmail_quote">
                                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                        <div bgcolor="#FFFFFF" text="#000000">
                                          <div>Hi,<br>
                                            <br>
                                            I have combined the momentum
                                            linear eqns involving x,y,z
                                            into 1 large matrix. The
                                            Poisson eqn is solved using
                                            HYPRE strcut format so it's
                                            not included. I run the code
                                            for 50 timesteps (hence 50
                                            kspsolve) using 96 procs.
                                            The log_summary is given
                                            below. I have some
                                            questions:<br>
                                            <br>
                                            1. After combining the
                                            matrix, I should have only 1
                                            PETSc matrix. Why does it
                                            says there are 4 matrix, 12
                                            vector etc? <br>
                                          </div>
                                        </div>
                                      </blockquote>
                                      <div><br>
                                      </div>
                                      <div>They are part of
                                        preconditioning. Are you sure
                                        you're using Hypre for this? It
                                        looks like you are using
                                        bjacobi/ilu.</div>
                                      <div> </div>
                                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                        <div bgcolor="#FFFFFF" text="#000000">
                                          <div> <br>
                                            2. I'm looking at the stages
                                            which take the longest time.
                                            It seems that
                                            MatAssemblyBegin, VecNorm,
                                            VecAssemblyBegin,
                                            VecScatterEnd have very high
                                            ratios. The ratios of some
                                            others are also not too good
                                            (~ 1.6 - 2). So are these
                                            stages the reason why my
                                            code is not scaling well?
                                            What can I do to improve it?<br>
                                          </div>
                                        </div>
                                      </blockquote>
                                      <div><br>
                                      </div>
                                      <div>3/4 of the solve time is
                                        evenly balanced between MatMult,
                                        MatSolve, MatLUFactorNumeric,
                                        and VecNorm+VecDot.</div>
                                      <div><br>
                                      </div>
                                      <div>The high VecAssembly time
                                        might be due to generating a lot
                                        of entries off-process?</div>
                                      <div><br>
                                      </div>
                                      <div>In any case, this looks like
                                        an _extremely_ slow network,
                                        perhaps it's misconfigured?</div>
                                    </div>
                                  </blockquote>
                                  <br>
                                </div>
                                My cluster is configured with 48 procs
                                per node. I re-run the case, using only
                                48 procs, thus there's no need to pass
                                over a 'slow' interconnect. I'm now also
                                using GAMG and BCGS for the poisson and
                                momentum eqn respectively. I have also
                                separated the x,y,z component of the
                                momentum eqn to 3 separate linear eqns
                                to debug the problem. <br>
                                <br>
                                Results show that stage "momentum_z" is
                                taking a lot of time. I wonder if it has
                                to do with the fact that I am
                                partitioning my grids in the z
                                direction. VecScatterEnd, MatMult are
                                taking a lot of time. VecNormalize,
                                VecScatterEnd, VecNorm, VecAssemblyBegin
                                's ratio are also not good.<br>
                                <br>
                                I wonder why a lot of entries are
                                generated off-process.<br>
                                <br>
                                I create my RHS vector using:<br>
                                <br>
                                <i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
                                <br>
                                where ijk_xyz_sta and ijk_xyz_end are
                                obtained from<br>
                                <br>
                                <i>call
                                  MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>
                                <br>
                                I then insert the values into the vector
                                using:<br>
                                <br>
                                <i>call VecSetValues(b_rhs_semi_z ,
                                  ijk_xyz_end - ijk_xyz_sta ,
                                  (/ijk_xyz_sta : ijk_xyz_end - 1/) ,
                                  q_semi_vect_z(ijk_xyz_sta + 1 :
                                  ijk_xyz_end) , INSERT_VALUES , ierr)</i><br>
                                <br>
                                What should I do to correct the problem?<br>
                                <br>
                                Thanks
                                <div>
                                  <div><br>
                                    <br>
                                    <blockquote type="cite">
                                      <div class="gmail_quote">
                                        <div> </div>
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                          <div bgcolor="#FFFFFF" text="#000000">
                                            <div> <br>
                                              Btw, I insert matrix
                                              using:<br>
                                              <br>
                                              <i>do
                                                ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>
                                              </i><i><br>
                                              </i><i>    II = ijk - 1</i><i>   

                                                !Fortran shift to
                                                0-based</i><i><br>
                                              </i><i>    </i><i><br>
                                              </i><i>    call
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>
                                              </i><i><br>
                                              </i><i>end do</i><br>
                                              <br>
                                              where
                                              ijk_xyz_sta/ijk_xyz_end
                                              are the starting/end index<br>
                                              <br>
                                              int_semi_xyz(ijk,1:7)
                                              stores the 7 column global
                                              indices<br>
                                              <br>
                                              semi_mat_xyz has the
                                              corresponding values.<br>
                                              <br>
                                              and I insert vectors
                                              using:<br>
                                              <br>
                                              call
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>
                                              <br>
                                              Thanks!<br>
                                              <br>
                                              <i><br>
                                              </i><br>
                                              <pre cols="72">Yours sincerely,

TAY wee-beng</pre>
                                              <div>
                                                <div> On 30/9/2012 11:30
                                                  PM, Jed Brown wrote:<br>
                                                </div>
                                              </div>
                                            </div>
                                            <div>
                                              <div>
                                                <blockquote type="cite">
                                                  <p>You can measure the
                                                    time spent in Hypre
                                                    via PCApply and
                                                    PCSetUp, but you
                                                    can't get finer
                                                    grained integrated
                                                    profiling because it
                                                    was not set up that
                                                    way.</p>
                                                  <div class="gmail_quote">On

                                                    Sep 30, 2012 3:26
                                                    PM, "TAY wee-beng"
                                                    <<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>
                                                    wrote:<br type="attribution">
                                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                      <div bgcolor="#FFFFFF" text="#000000">
                                                        <div>On
                                                          27/9/2012 1:44
                                                          PM, Matthew
                                                          Knepley wrote:<br>
                                                        </div>
                                                        <blockquote type="cite">On
                                                          Thu, Sep 27,
                                                          2012 at 3:49
                                                          AM, TAY
                                                          wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
                                                          wrote:<br>
                                                          <div class="gmail_quote">
                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          Hi,<br>
                                                          <br>
                                                          I'm doing a
                                                          log summary
                                                          for my 3d cfd
                                                          code. I have
                                                          some
                                                          questions:<br>
                                                          <br>
                                                          1. if I'm
                                                          solving 3
                                                          linear
                                                          equations
                                                          using ksp, is
                                                          the result
                                                          given in the
                                                          log summary
                                                          the total of
                                                          the 3 linear
                                                          eqns'
                                                          performance?
                                                          How can I get
                                                          the
                                                          performance
                                                          for each
                                                          individual
                                                          eqn?<br>
                                                          </blockquote>
                                                          <div><br>
                                                          </div>
                                                          <div>Use
                                                          logging
                                                          stages: <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>


                                                          <div> </div>
                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          2. If I run my
                                                          code for 10
                                                          time steps,
                                                          does the log
                                                          summary gives
                                                          the total or
                                                          avg
                                                          performance/ratio?<br>
                                                          </blockquote>
                                                          <div><br>
                                                          </div>
                                                          <div>Total.</div>
                                                          <div> </div>
                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          3. Besides
                                                          PETSc, I'm
                                                          also using
                                                          HYPRE's native
                                                          geometric MG
                                                          (Struct) to
                                                          solve my
                                                          Cartesian's
                                                          grid CFD
                                                          poisson eqn.
                                                          Is there any
                                                          way I can use
                                                          PETSc's log
                                                          summary to get
                                                          HYPRE's
                                                          performance?
                                                          If I use
                                                          boomerAMG thru
                                                          PETSc, can I
                                                          get its
                                                          performance?</blockquote>
                                                          <div><br>
                                                          </div>
                                                          <div>If you
                                                          mean flops,
                                                          only if you
                                                          count them
                                                          yourself and
                                                          tell PETSc
                                                          using <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>


                                                          <div><br>
                                                          </div>
                                                          <div>This is
                                                          the
                                                          disadvantage
                                                          of using
                                                          packages that
                                                          do not
                                                          properly
                                                          monitor things
                                                          :)</div>
                                                          <div><br>
                                                          </div>
                                                          <div>    Matt</div>
                                                          <div> </div>
                                                          </div>
                                                        </blockquote>
                                                        So u mean if I
                                                        use boomerAMG
                                                        thru PETSc,
                                                        there is no
                                                        proper way of
                                                        evaluating its
                                                        performance,
                                                        beside using
                                                        PetscLogFlops?<br>
                                                        <blockquote type="cite">
                                                          <div class="gmail_quote">
                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          <span><font color="#888888"><br>
                                                          -- <br>
                                                          Yours
                                                          sincerely,<br>
                                                          <br>
                                                          TAY wee-beng<br>
                                                          <br>
                                                          </font></span></blockquote>
                                                          </div>
                                                          <br>
                                                          <br clear="all">
                                                          <span><font color="#888888">
                                                          <div><br>
                                                          </div>
                                                          -- <br>
                                                          What most
                                                          experimenters
                                                          take for
                                                          granted before
                                                          they begin
                                                          their
                                                          experiments is
                                                          infinitely
                                                          more
                                                          interesting
                                                          than any
                                                          results to
                                                          which their
                                                          experiments
                                                          lead.<br>
                                                          -- Norbert
                                                          Wiener<br>
                                                          </font></span></blockquote>
                                                        <span><font color="#888888"> <br>
                                                          </font></span></div>
                                                      <span><font color="#888888"> </font></span></blockquote>
                                                    <span><font color="#888888">
                                                      </font></span></div>
                                                  <span><font color="#888888"> </font></span></blockquote>
                                                <span><font color="#888888"> <br>
                                                  </font></span></div>
                                              <span><font color="#888888"> </font></span></div>
                                            <span><font color="#888888"> </font></span></div>
                                          <span><font color="#888888"> </font></span></blockquote>
                                        <span><font color="#888888"> </font></span></div>
                                      <span><font color="#888888"> <br>
                                        </font></span></blockquote>
                                    <span><font color="#888888"> <br>
                                      </font></span></div>
                                  <span><font color="#888888"> </font></span></div>
                                <span><font color="#888888"> </font></span></div>
                              <span><font color="#888888">
                                </font></span></blockquote>
                            <span><font color="#888888">
                              </font></span></div>
                          <span><font color="#888888"> <br>
                            </font></span></blockquote>
                        <span><font color="#888888"> <br>
                          </font></span></div>
                      <span><font color="#888888"> </font></span></div>
                    <span><font color="#888888"> </font></span></div>
                  <span><font color="#888888"> </font></span></blockquote>
                <span><font color="#888888"> </font></span></div>
              <span><font color="#888888"> <br>
                  <br clear="all">
                  <div><br>
                  </div>
                  -- <br>
                  What most experimenters take for granted before they
                  begin their experiments is infinitely more interesting
                  than any results to which their experiments lead.<br>
                  -- Norbert Wiener<br>
                </font></span></blockquote>
            <br>
          </div>
        </blockquote>
      </div>
      <br>
      <br clear="all">
      <div><br>
      </div>
      -- <br>
      What most experimenters take for granted before they begin their
      experiments is infinitely more interesting than any results to
      which their experiments lead.<br>
      -- Norbert Wiener<br>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>