On Thu, Oct 4, 2012 at 11:01 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <div>On 4/10/2012 3:40 AM, Matthew Knepley

      wrote:<br>

    </div>

    <blockquote type="cite">On Wed, Oct 3, 2012 at 4:05 PM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

      wrote:<br>

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000">

            <div>Hi Jed,<br>

              <br>

              I believe they are real cores. Anyway, I have attached the

              log summary for the 12/24/48 cores. I re-run a smaller

              case because the large problem can't run with 12cores.<br>

            </div>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>Okay, look at VecScatterBegin/End for 24 and 48 cores (I am

          guessing you have 4 16-core chips, but please figure this

          out).</div>

        <div>The messages are logged in ScatterBegin, and the time is

          logged in ScatterEnd. From 24 to 48 cores the time is cut in

          half.</div>

        <div>If you were only communicating the boundary, this is

          completely backwards, so you are communicating a fair fraction

          of ALL</div>

        <div>the values in a subdomain. Figure out why your partition is

          so screwed up and this will go away.</div>

      </div>

    </blockquote>

    <br>

    What do you mean by "If you were only communicating the boundary,

    this is completely backwards, so you are communicating a fair

    fraction of ALL the values in a subdomain"?<br></div></blockquote><div><br></div><div>If you have 48 partitions instead of 24, you have a larger interface, so AssemblyEnd() should take</div><div>slightly longer. However, your AssemblyEnd() takes HALF the time, which means its communicating</div>

<div>much fewer values, which means you are not sending interface values, you are sending interior values,</div><div>since the interior shrinks when you have more partitions.</div><div><br></div><div>What this probably means is that your assembly routines are screwed up, and sending data all over the place.</div>

<div><br></div><div>   Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    I partition my domain in the z direction, as shown in the attached

    pic. The circled region is where the airfoils are. I'm using an

    immersed boundary method (IBM) code so the grid is all Cartesian.<br>

    <br>

    I created my Z matrix using:<br>

    <br>

    call

MatCreateAIJ(MPI_COMM_WORLD,ijk_end-ijk_sta,ijk_end-ijk_sta,PETSC_DECIDE,PETSC_DECIDE,7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,A_semi_z,ierr)<br>

    <br>

    where ijk_sta / ijk_end are the starting/ending global indices of

    the row.<br>

    <br>

    7 is because the star-stencil is used in 3D.<br>

    <br>

    I create my RHS vector using:<br>

    <br>

    <i>call

VecCreateMPI(MPI_COMM_WORLD,ijk_end-ijk_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>

    <br>

    <div>The values for the matrix and vector were calculated before

      PETSc logging so they don't come into play.<br>

      <br>

      They are also done in a similar fashion for matrix x and y. I

      still can't get it why solving the z momentum eqn takes so much

      time. Which portion should I focus on?<br>

      <br>

      Tks!<br>

      <br>

    </div>

    <blockquote type="cite">

      <div class="gmail_quote">

        <div><br>

        </div>

        <div>   Matt</div>

        <div> </div>

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000">

            <div>

              <pre cols="72">Yours sincerely,

TAY wee-beng</pre>

              <div>

                <div> On 3/10/2012 5:59 PM, Jed Brown wrote:<br>

                </div>

              </div>

            </div>

            <div>

              <div>

                <blockquote type="cite">There is an inordinate amount of

                  time being spent in VecScatterEnd(). That sometimes

                  indicates a very bad partition. Also, are your "48

                  cores" real physical cores or just "logical cores"

                  (look like cores to the operating system, usually

                  advertised as "threads" by the vendor, nothing like

                  cores in reality)? That can cause a huge load

                  imbalance and very confusing results as

                  over-subscribed threads compete for shared resources.

                  Step it back to 24 threads and 12 threads, send

                  log_summary for each.<br>

                  <br>

                  <div class="gmail_quote">On Wed, Oct 3, 2012 at 8:08

                    AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

                    wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div bgcolor="#FFFFFF" text="#000000">

                        <div>

                          <div>On 2/10/2012 2:43 PM, Jed Brown wrote:<br>

                          </div>

                          <blockquote type="cite">On Tue, Oct 2, 2012 at

                            8:35 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

                            wrote:<br>

                            <div class="gmail_quote">

                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                <div bgcolor="#FFFFFF" text="#000000">

                                  <div>Hi,<br>

                                    <br>

                                    I have combined the momentum linear

                                    eqns involving x,y,z into 1 large

                                    matrix. The Poisson eqn is solved

                                    using HYPRE strcut format so it's

                                    not included. I run the code for 50

                                    timesteps (hence 50 kspsolve) using

                                    96 procs. The log_summary is given

                                    below. I have some questions:<br>

                                    <br>

                                    1. After combining the matrix, I

                                    should have only 1 PETSc matrix. Why

                                    does it says there are 4 matrix, 12

                                    vector etc? <br>

                                  </div>

                                </div>

                              </blockquote>

                              <div><br>

                              </div>

                              <div>They are part of preconditioning. Are

                                you sure you're using Hypre for this? It

                                looks like you are using bjacobi/ilu.</div>

                              <div> </div>

                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                <div bgcolor="#FFFFFF" text="#000000">

                                  <div> <br>

                                    2. I'm looking at the stages which

                                    take the longest time. It seems that

                                    MatAssemblyBegin, VecNorm,

                                    VecAssemblyBegin, VecScatterEnd have

                                    very high ratios. The ratios of some

                                    others are also not too good (~ 1.6

                                    - 2). So are these stages the reason

                                    why my code is not scaling well?

                                    What can I do to improve it?<br>

                                  </div>

                                </div>

                              </blockquote>

                              <div><br>

                              </div>

                              <div>3/4 of the solve time is evenly

                                balanced between MatMult, MatSolve,

                                MatLUFactorNumeric, and VecNorm+VecDot.</div>

                              <div><br>

                              </div>

                              <div>The high VecAssembly time might be

                                due to generating a lot of entries

                                off-process?</div>

                              <div><br>

                              </div>

                              <div>In any case, this looks like an

                                _extremely_ slow network, perhaps it's

                                misconfigured?</div>

                            </div>

                          </blockquote>

                          <br>

                        </div>

                        My cluster is configured with 48 procs per node.

                        I re-run the case, using only 48 procs, thus

                        there's no need to pass over a 'slow'

                        interconnect. I'm now also using GAMG and BCGS

                        for the poisson and momentum eqn respectively. I

                        have also separated the x,y,z component of the

                        momentum eqn to 3 separate linear eqns to debug

                        the problem. <br>

                        <br>

                        Results show that stage "momentum_z" is taking a

                        lot of time. I wonder if it has to do with the

                        fact that I am partitioning my grids in the z

                        direction. VecScatterEnd, MatMult are taking a

                        lot of time. VecNormalize, VecScatterEnd,

                        VecNorm, VecAssemblyBegin 's ratio are also not

                        good.<br>

                        <br>

                        I wonder why a lot of entries are generated

                        off-process.<br>

                        <br>

                        I create my RHS vector using:<br>

                        <br>

                        <i>call

VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>

                        <br>

                        where ijk_xyz_sta and ijk_xyz_end are obtained

                        from<br>

                        <br>

                        <i>call

                          MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>

                        <br>

                        I then insert the values into the vector using:<br>

                        <br>

                        <i>call VecSetValues(b_rhs_semi_z , ijk_xyz_end

                          - ijk_xyz_sta , (/ijk_xyz_sta : ijk_xyz_end -

                          1/) , q_semi_vect_z(ijk_xyz_sta + 1 :

                          ijk_xyz_end) , INSERT_VALUES , ierr)</i><br>

                        <br>

                        What should I do to correct the problem?<br>

                        <br>

                        Thanks

                        <div>

                          <div><br>

                            <br>

                            <blockquote type="cite">

                              <div class="gmail_quote">

                                <div> </div>

                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                  <div bgcolor="#FFFFFF" text="#000000">

                                    <div> <br>

                                      Btw, I insert matrix using:<br>

                                      <br>

                                      <i>do

                                        ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>

                                      </i><i><br>

                                      </i><i>    II = ijk - 1</i><i>   

                                        !Fortran shift to 0-based</i><i><br>

                                      </i><i>    </i><i><br>

                                      </i><i>    call

MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>

                                      </i><i><br>

                                      </i><i>end do</i><br>

                                      <br>

                                      where ijk_xyz_sta/ijk_xyz_end are

                                      the starting/end index<br>

                                      <br>

                                      int_semi_xyz(ijk,1:7) stores the 7

                                      column global indices<br>

                                      <br>

                                      semi_mat_xyz has the corresponding

                                      values.<br>

                                      <br>

                                      and I insert vectors using:<br>

                                      <br>

                                      call

VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>

                                      <br>

                                      Thanks!<br>

                                      <br>

                                      <i><br>

                                      </i><br>

                                      <pre cols="72">Yours sincerely,

TAY wee-beng</pre>

                                      <div>

                                        <div> On 30/9/2012 11:30 PM, Jed

                                          Brown wrote:<br>

                                        </div>

                                      </div>

                                    </div>

                                    <div>

                                      <div>

                                        <blockquote type="cite">

                                          <p>You can measure the time

                                            spent in Hypre via PCApply

                                            and PCSetUp, but you can't

                                            get finer grained integrated

                                            profiling because it was not

                                            set up that way.</p>

                                          <div class="gmail_quote">On

                                            Sep 30, 2012 3:26 PM, "TAY

                                            wee-beng" <<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>

                                            wrote:<br type="attribution">

                                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                              <div bgcolor="#FFFFFF" text="#000000">

                                                <div>On 27/9/2012 1:44

                                                  PM, Matthew Knepley

                                                  wrote:<br>

                                                </div>

                                                <blockquote type="cite">On

                                                  Thu, Sep 27, 2012 at

                                                  3:49 AM, TAY wee-beng

                                                  <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

                                                  wrote:<br>

                                                  <div class="gmail_quote">

                                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                      Hi,<br>

                                                      <br>

                                                      I'm doing a log

                                                      summary for my 3d

                                                      cfd code. I have

                                                      some questions:<br>

                                                      <br>

                                                      1. if I'm solving

                                                      3 linear equations

                                                      using ksp, is the

                                                      result given in

                                                      the log summary

                                                      the total of the 3

                                                      linear eqns'

                                                      performance? How

                                                      can I get the

                                                      performance for

                                                      each individual

                                                      eqn?<br>

                                                    </blockquote>

                                                    <div><br>

                                                    </div>

                                                    <div>Use logging

                                                      stages: <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>

                                                    <div> </div>

                                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                      2. If I run my

                                                      code for 10 time

                                                      steps, does the

                                                      log summary gives

                                                      the total or avg

                                                      performance/ratio?<br>

                                                    </blockquote>

                                                    <div><br>

                                                    </div>

                                                    <div>Total.</div>

                                                    <div> </div>

                                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                      3. Besides PETSc,

                                                      I'm also using

                                                      HYPRE's native

                                                      geometric MG

                                                      (Struct) to solve

                                                      my Cartesian's

                                                      grid CFD poisson

                                                      eqn. Is there any

                                                      way I can use

                                                      PETSc's log

                                                      summary to get

                                                      HYPRE's

                                                      performance? If I

                                                      use boomerAMG thru

                                                      PETSc, can I get

                                                      its performance?</blockquote>

                                                    <div><br>

                                                    </div>

                                                    <div>If you mean

                                                      flops, only if you

                                                      count them

                                                      yourself and tell

                                                      PETSc using <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>

                                                    <div><br>

                                                    </div>

                                                    <div>This is the

                                                      disadvantage of

                                                      using packages

                                                      that do not

                                                      properly monitor

                                                      things :)</div>

                                                    <div><br>

                                                    </div>

                                                    <div>    Matt</div>

                                                    <div> </div>

                                                  </div>

                                                </blockquote>

                                                So u mean if I use

                                                boomerAMG thru PETSc,

                                                there is no proper way

                                                of evaluating its

                                                performance, beside

                                                using PetscLogFlops?<br>

                                                <blockquote type="cite">

                                                  <div class="gmail_quote">

                                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                      <span><font color="#888888"><br>

                                                          -- <br>

                                                          Yours

                                                          sincerely,<br>

                                                          <br>

                                                          TAY wee-beng<br>

                                                          <br>

                                                        </font></span></blockquote>

                                                  </div>

                                                  <br>

                                                  <br clear="all"><span class="HOEnZb"><font color="#888888">

                                                  <div><br>

                                                  </div>

                                                  -- <br>

                                                  What most

                                                  experimenters take for

                                                  granted before they

                                                  begin their

                                                  experiments is

                                                  infinitely more

                                                  interesting than any

                                                  results to which their

                                                  experiments lead.<br>

                                                  -- Norbert Wiener<br>

                                                </font></span></blockquote><span class="HOEnZb"><font color="#888888">

                                                <br>

                                              </font></span></div><span class="HOEnZb"><font color="#888888">

                                            </font></span></blockquote><span class="HOEnZb"><font color="#888888">

                                          </font></span></div><span class="HOEnZb"><font color="#888888">

                                        </font></span></blockquote><span class="HOEnZb"><font color="#888888">

                                        <br>

                                      </font></span></div><span class="HOEnZb"><font color="#888888">

                                    </font></span></div><span class="HOEnZb"><font color="#888888">

                                  </font></span></div><span class="HOEnZb"><font color="#888888">

                                </font></span></blockquote><span class="HOEnZb"><font color="#888888">

                              </font></span></div><span class="HOEnZb"><font color="#888888">

                              <br>

                            </font></span></blockquote><span class="HOEnZb"><font color="#888888">

                            <br>

                          </font></span></div><span class="HOEnZb"><font color="#888888">

                        </font></span></div><span class="HOEnZb"><font color="#888888">

                      </font></span></div><span class="HOEnZb"><font color="#888888">

                    </font></span></blockquote><span class="HOEnZb"><font color="#888888">

                  </font></span></div><span class="HOEnZb"><font color="#888888">

                  <br>

                </font></span></blockquote><span class="HOEnZb"><font color="#888888">

                <br>

              </font></span></div><span class="HOEnZb"><font color="#888888">

            </font></span></div><span class="HOEnZb"><font color="#888888">

          </font></span></div><span class="HOEnZb"><font color="#888888">

        </font></span></blockquote><span class="HOEnZb"><font color="#888888">

      </font></span></div><span class="HOEnZb"><font color="#888888">

      <br>

      <br clear="all">

      <div><br>

      </div>

      -- <br>

      What most experimenters take for granted before they begin their

      experiments is infinitely more interesting than any results to

      which their experiments lead.<br>

      -- Norbert Wiener<br>

    </font></span></blockquote>

    <br>

  </div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>