Can you send a picture of what your domain looks like and what shape the part owned by a given processor looks like? Best would be to write out the mesh with a variable marking the rank owning each vertex, then do a color plot in Paraview or whatever you use to show the partition.<div>

<br></div><div>VecScatterBegin/End is taking much more time than these, and really a pretty unreasonable amount of time in general.<br><br><div class="gmail_quote">On Thu, Oct 4, 2012 at 2:16 PM, Wee-Beng Tay <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><div class="im">

    <div>On 4/10/2012 5:11 PM, Matthew Knepley

      wrote:<br>

    </div>

    <blockquote type="cite">On Thu, Oct 4, 2012 at 11:01 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

      wrote:<br>

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000">

            <div>On 4/10/2012 3:40 AM, Matthew Knepley wrote:<br>

            </div>

            <blockquote type="cite">On Wed, Oct 3, 2012 at 4:05 PM, TAY

              wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

              wrote:<br>

              <div class="gmail_quote">

                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <div bgcolor="#FFFFFF" text="#000000">

                    <div>Hi Jed,<br>

                      <br>

                      I believe they are real cores. Anyway, I have

                      attached the log summary for the 12/24/48 cores. I

                      re-run a smaller case because the large problem

                      can't run with 12cores.<br>

                    </div>

                  </div>

                </blockquote>

                <div><br>

                </div>

                <div>Okay, look at VecScatterBegin/End for 24 and 48

                  cores (I am guessing you have 4 16-core chips, but

                  please figure this out).</div>

                <div>The messages are logged in ScatterBegin, and the

                  time is logged in ScatterEnd. From 24 to 48 cores the

                  time is cut in half.</div>

                <div>If you were only communicating the boundary, this

                  is completely backwards, so you are communicating a

                  fair fraction of ALL</div>

                <div>the values in a subdomain. Figure out why your

                  partition is so screwed up and this will go away.</div>

              </div>

            </blockquote>

            <br>

            What do you mean by "If you were only communicating the

            boundary, this is completely backwards, so you are

            communicating a fair fraction of ALL the values in a

            subdomain"?<br>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>If you have 48 partitions instead of 24, you have a larger

          interface, so AssemblyEnd() should take</div>

        <div>slightly longer. However, your AssemblyEnd() takes HALF the

          time, which means its communicating</div>

        <div>much fewer values, which means you are not sending

          interface values, you are sending interior values,</div>

        <div>since the interior shrinks when you have more partitions.</div>

        <div><br>

        </div>

        <div>What this probably means is that your assembly routines are

          screwed up, and sending data all over the place.</div>

        <div><br>

        </div>

      </div>

    </blockquote></div>

    Ok I got it now. Looking at the AssemblyEnd time,<br>

    <br>

    12 procs<br>

    <br>

    MatAssemblyEnd       145 1.0 1.6342e+01 1.8 0.00e+00 0.0 4.4e+01

    6.0e+04 8.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

    <br>VecAssemblyEnd       388 1.0 1.4472e-03 1.4 0.00e+00 0.0 0.0e+00

    0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

    <br>

    24 procs<br>

    <br>

    MatAssemblyEnd       145 1.0 1.1618e+01 2.4 0.00e+00 0.0 9.2e+01

    6.0e+04 8.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

    <br>VecAssemblyEnd       388 1.0 2.3527e-03 2.4 0.00e+00 0.0 0.0e+00

    0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br><br>

    48 procs<br>

    <br>

    MatAssemblyEnd       145 1.0 7.4327e+00 2.4 0.00e+00 0.0 1.9e+02

    6.0e+04 8.0e+00  0  0  0  0  0   0  0  0  0  0     <br>

    <br><br>

    VecAssemblyEnd       388 1.0 2.8818e-03 3.7 0.00e+00 0.0 0.0e+00

    0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

    <br>

    VecAssemblyEnd time increases with procs, does it mean that there is nothing wrong with it?<br><br>On the other hand, MatAssemblyEnd time decreases with procs. So that's where the problem lies, is that so? <br><br>

I'm still scanning my code but haven't found the error yet. It seems strange because I inserted the matrix and vector exactly the same way for x,y,z. The u,v,w are also allocated with the same indices. Shouldn't the error be the same for x, y and z too?<br>

<br>Trying to get some hints as to where else I need to look in my code...<br><br>Tks<div><div class="h5"><br><br>

    <br>

    <br>

    <br>

    <br>

    <br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <div>   Matt</div>

        <div> </div>

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000"> I partition my domain

            in the z direction, as shown in the attached pic. The

            circled region is where the airfoils are. I'm using an

            immersed boundary method (IBM) code so the grid is all

            Cartesian.<br>

            <br>

            I created my Z matrix using:<br>

            <br>

            call

MatCreateAIJ(MPI_COMM_WORLD,ijk_end-ijk_sta,ijk_end-ijk_sta,PETSC_DECIDE,PETSC_DECIDE,7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,A_semi_z,ierr)<br>

            <br>

            where ijk_sta / ijk_end are the starting/ending global

            indices of the row.<br>

            <br>

            7 is because the star-stencil is used in 3D.<br>

            <br>

            I create my RHS vector using:<br>

            <br>

            <i>call

VecCreateMPI(MPI_COMM_WORLD,ijk_end-ijk_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>

            <br>

            <div>The values for the matrix and vector were calculated

              before PETSc logging so they don't come into play.<br>

              <br>

              They are also done in a similar fashion for matrix x and

              y. I still can't get it why solving the z momentum eqn

              takes so much time. Which portion should I focus on?<br>

              <br>

              Tks!<br>

              <br>

            </div>

            <blockquote type="cite">

              <div class="gmail_quote">

                <div><br>

                </div>

                <div>   Matt</div>

                <div> </div>

                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <div bgcolor="#FFFFFF" text="#000000">

                    <div>

                      <pre cols="72">Yours sincerely,

TAY wee-beng</pre>

                      <div>

                        <div> On 3/10/2012 5:59 PM, Jed Brown wrote:<br>

                        </div>

                      </div>

                    </div>

                    <div>

                      <div>

                        <blockquote type="cite">There is an inordinate

                          amount of time being spent in VecScatterEnd().

                          That sometimes indicates a very bad partition.

                          Also, are your "48 cores" real physical cores

                          or just "logical cores" (look like cores to

                          the operating system, usually advertised as

                          "threads" by the vendor, nothing like cores in

                          reality)? That can cause a huge load imbalance

                          and very confusing results as over-subscribed

                          threads compete for shared resources. Step it

                          back to 24 threads and 12 threads, send

                          log_summary for each.<br>

                          <br>

                          <div class="gmail_quote">On Wed, Oct 3, 2012

                            at 8:08 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

                            wrote:<br>

                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                              <div bgcolor="#FFFFFF" text="#000000">

                                <div>

                                  <div>On 2/10/2012 2:43 PM, Jed Brown

                                    wrote:<br>

                                  </div>

                                  <blockquote type="cite">On Tue, Oct 2,

                                    2012 at 8:35 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

                                    wrote:<br>

                                    <div class="gmail_quote">

                                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                        <div bgcolor="#FFFFFF" text="#000000">

                                          <div>Hi,<br>

                                            <br>

                                            I have combined the momentum

                                            linear eqns involving x,y,z

                                            into 1 large matrix. The

                                            Poisson eqn is solved using

                                            HYPRE strcut format so it's

                                            not included. I run the code

                                            for 50 timesteps (hence 50

                                            kspsolve) using 96 procs.

                                            The log_summary is given

                                            below. I have some

                                            questions:<br>

                                            <br>

                                            1. After combining the

                                            matrix, I should have only 1

                                            PETSc matrix. Why does it

                                            says there are 4 matrix, 12

                                            vector etc? <br>

                                          </div>

                                        </div>

                                      </blockquote>

                                      <div><br>

                                      </div>

                                      <div>They are part of

                                        preconditioning. Are you sure

                                        you're using Hypre for this? It

                                        looks like you are using

                                        bjacobi/ilu.</div>

                                      <div> </div>

                                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                        <div bgcolor="#FFFFFF" text="#000000">

                                          <div> <br>

                                            2. I'm looking at the stages

                                            which take the longest time.

                                            It seems that

                                            MatAssemblyBegin, VecNorm,

                                            VecAssemblyBegin,

                                            VecScatterEnd have very high

                                            ratios. The ratios of some

                                            others are also not too good

                                            (~ 1.6 - 2). So are these

                                            stages the reason why my

                                            code is not scaling well?

                                            What can I do to improve it?<br>

                                          </div>

                                        </div>

                                      </blockquote>

                                      <div><br>

                                      </div>

                                      <div>3/4 of the solve time is

                                        evenly balanced between MatMult,

                                        MatSolve, MatLUFactorNumeric,

                                        and VecNorm+VecDot.</div>

                                      <div><br>

                                      </div>

                                      <div>The high VecAssembly time

                                        might be due to generating a lot

                                        of entries off-process?</div>

                                      <div><br>

                                      </div>

                                      <div>In any case, this looks like

                                        an _extremely_ slow network,

                                        perhaps it's misconfigured?</div>

                                    </div>

                                  </blockquote>

                                  <br>

                                </div>

                                My cluster is configured with 48 procs

                                per node. I re-run the case, using only

                                48 procs, thus there's no need to pass

                                over a 'slow' interconnect. I'm now also

                                using GAMG and BCGS for the poisson and

                                momentum eqn respectively. I have also

                                separated the x,y,z component of the

                                momentum eqn to 3 separate linear eqns

                                to debug the problem. <br>

                                <br>

                                Results show that stage "momentum_z" is

                                taking a lot of time. I wonder if it has

                                to do with the fact that I am

                                partitioning my grids in the z

                                direction. VecScatterEnd, MatMult are

                                taking a lot of time. VecNormalize,

                                VecScatterEnd, VecNorm, VecAssemblyBegin

                                's ratio are also not good.<br>

                                <br>

                                I wonder why a lot of entries are

                                generated off-process.<br>

                                <br>

                                I create my RHS vector using:<br>

                                <br>

                                <i>call

VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>

                                <br>

                                where ijk_xyz_sta and ijk_xyz_end are

                                obtained from<br>

                                <br>

                                <i>call

                                  MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>

                                <br>

                                I then insert the values into the vector

                                using:<br>

                                <br>

                                <i>call VecSetValues(b_rhs_semi_z ,

                                  ijk_xyz_end - ijk_xyz_sta ,

                                  (/ijk_xyz_sta : ijk_xyz_end - 1/) ,

                                  q_semi_vect_z(ijk_xyz_sta + 1 :

                                  ijk_xyz_end) , INSERT_VALUES , ierr)</i><br>

                                <br>

                                What should I do to correct the problem?<br>

                                <br>

                                Thanks

                                <div>

                                  <div><br>

                                    <br>

                                    <blockquote type="cite">

                                      <div class="gmail_quote">

                                        <div> </div>

                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                          <div bgcolor="#FFFFFF" text="#000000">

                                            <div> <br>

                                              Btw, I insert matrix

                                              using:<br>

                                              <br>

                                              <i>do

                                                ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>

                                              </i><i><br>

                                              </i><i>    II = ijk - 1</i><i>   

                                                !Fortran shift to

                                                0-based</i><i><br>

                                              </i><i>    </i><i><br>

                                              </i><i>    call

MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>

                                              </i><i><br>

                                              </i><i>end do</i><br>

                                              <br>

                                              where

                                              ijk_xyz_sta/ijk_xyz_end

                                              are the starting/end index<br>

                                              <br>

                                              int_semi_xyz(ijk,1:7)

                                              stores the 7 column global

                                              indices<br>

                                              <br>

                                              semi_mat_xyz has the

                                              corresponding values.<br>

                                              <br>

                                              and I insert vectors

                                              using:<br>

                                              <br>

                                              call

VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>

                                              <br>

                                              Thanks!<br>

                                              <br>

                                              <i><br>

                                              </i><br>

                                              <pre cols="72">Yours sincerely,

TAY wee-beng</pre>

                                              <div>

                                                <div> On 30/9/2012 11:30

                                                  PM, Jed Brown wrote:<br>

                                                </div>

                                              </div>

                                            </div>

                                            <div>

                                              <div>

                                                <blockquote type="cite">

                                                  <p>You can measure the

                                                    time spent in Hypre

                                                    via PCApply and

                                                    PCSetUp, but you

                                                    can't get finer

                                                    grained integrated

                                                    profiling because it

                                                    was not set up that

                                                    way.</p>

                                                  <div class="gmail_quote">On

                                                    Sep 30, 2012 3:26

                                                    PM, "TAY wee-beng"

                                                    <<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>

                                                    wrote:<br type="attribution">

                                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                      <div bgcolor="#FFFFFF" text="#000000">

                                                        <div>On

                                                          27/9/2012 1:44

                                                          PM, Matthew

                                                          Knepley wrote:<br>

                                                        </div>

                                                        <blockquote type="cite">On

                                                          Thu, Sep 27,

                                                          2012 at 3:49

                                                          AM, TAY

                                                          wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

                                                          wrote:<br>

                                                          <div class="gmail_quote">

                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                          Hi,<br>

                                                          <br>

                                                          I'm doing a

                                                          log summary

                                                          for my 3d cfd

                                                          code. I have

                                                          some

                                                          questions:<br>

                                                          <br>

                                                          1. if I'm

                                                          solving 3

                                                          linear

                                                          equations

                                                          using ksp, is

                                                          the result

                                                          given in the

                                                          log summary

                                                          the total of

                                                          the 3 linear

                                                          eqns'

                                                          performance?

                                                          How can I get

                                                          the

                                                          performance

                                                          for each

                                                          individual

                                                          eqn?<br>

                                                          </blockquote>

                                                          <div><br>

                                                          </div>

                                                          <div>Use

                                                          logging

                                                          stages: <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>

                                                          <div> </div>

                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                          2. If I run my

                                                          code for 10

                                                          time steps,

                                                          does the log

                                                          summary gives

                                                          the total or

                                                          avg

                                                          performance/ratio?<br>

                                                          </blockquote>

                                                          <div><br>

                                                          </div>

                                                          <div>Total.</div>

                                                          <div> </div>

                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                          3. Besides

                                                          PETSc, I'm

                                                          also using

                                                          HYPRE's native

                                                          geometric MG

                                                          (Struct) to

                                                          solve my

                                                          Cartesian's

                                                          grid CFD

                                                          poisson eqn.

                                                          Is there any

                                                          way I can use

                                                          PETSc's log

                                                          summary to get

                                                          HYPRE's

                                                          performance?

                                                          If I use

                                                          boomerAMG thru

                                                          PETSc, can I

                                                          get its

                                                          performance?</blockquote>

                                                          <div><br>

                                                          </div>

                                                          <div>If you

                                                          mean flops,

                                                          only if you

                                                          count them

                                                          yourself and

                                                          tell PETSc

                                                          using <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>

                                                          <div><br>

                                                          </div>

                                                          <div>This is

                                                          the

                                                          disadvantage

                                                          of using

                                                          packages that

                                                          do not

                                                          properly

                                                          monitor things

                                                          :)</div>

                                                          <div><br>

                                                          </div>

                                                          <div>    Matt</div>

                                                          <div> </div>

                                                          </div>

                                                        </blockquote>

                                                        So u mean if I

                                                        use boomerAMG

                                                        thru PETSc,

                                                        there is no

                                                        proper way of

                                                        evaluating its

                                                        performance,

                                                        beside using

                                                        PetscLogFlops?<br>

                                                        <blockquote type="cite">

                                                          <div class="gmail_quote">

                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                          <span><font color="#888888"><br>

                                                          -- <br>

                                                          Yours

                                                          sincerely,<br>

                                                          <br>

                                                          TAY wee-beng<br>

                                                          <br>

                                                          </font></span></blockquote>

                                                          </div>

                                                          <br>

                                                          <br clear="all">

                                                          <span><font color="#888888">

                                                          <div><br>

                                                          </div>

                                                          -- <br>

                                                          What most

                                                          experimenters

                                                          take for

                                                          granted before

                                                          they begin

                                                          their

                                                          experiments is

                                                          infinitely

                                                          more

                                                          interesting

                                                          than any

                                                          results to

                                                          which their

                                                          experiments

                                                          lead.<br>

                                                          -- Norbert

                                                          Wiener<br>

                                                          </font></span></blockquote>

                                                        <span><font color="#888888"> <br>

                                                          </font></span></div>

                                                      <span><font color="#888888"> </font></span></blockquote>

                                                    <span><font color="#888888">

                                                      </font></span></div>

                                                  <span><font color="#888888"> </font></span></blockquote>

                                                <span><font color="#888888"> <br>

                                                  </font></span></div>

                                              <span><font color="#888888"> </font></span></div>

                                            <span><font color="#888888"> </font></span></div>

                                          <span><font color="#888888"> </font></span></blockquote>

                                        <span><font color="#888888"> </font></span></div>

                                      <span><font color="#888888"> <br>

                                        </font></span></blockquote>

                                    <span><font color="#888888"> <br>

                                      </font></span></div>

                                  <span><font color="#888888"> </font></span></div>

                                <span><font color="#888888"> </font></span></div>

                              <span><font color="#888888">

                                </font></span></blockquote>

                            <span><font color="#888888">

                              </font></span></div>

                          <span><font color="#888888"> <br>

                            </font></span></blockquote>

                        <span><font color="#888888"> <br>

                          </font></span></div>

                      <span><font color="#888888"> </font></span></div>

                    <span><font color="#888888"> </font></span></div>

                  <span><font color="#888888"> </font></span></blockquote>

                <span><font color="#888888"> </font></span></div>

              <span><font color="#888888"> <br>

                  <br clear="all">

                  <div><br>

                  </div>

                  -- <br>

                  What most experimenters take for granted before they

                  begin their experiments is infinitely more interesting

                  than any results to which their experiments lead.<br>

                  -- Norbert Wiener<br>

                </font></span></blockquote>

            <br>

          </div>

        </blockquote>

      </div>

      <br>

      <br clear="all">

      <div><br>

      </div>

      -- <br>

      What most experimenters take for granted before they begin their

      experiments is infinitely more interesting than any results to

      which their experiments lead.<br>

      -- Norbert Wiener<br>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br></div>