There is an inordinate amount of time being spent in VecScatterEnd(). That sometimes indicates a very bad partition. Also, are your "48 cores" real physical cores or just "logical cores" (look like cores to the operating system, usually advertised as "threads" by the vendor, nothing like cores in reality)? That can cause a huge load imbalance and very confusing results as over-subscribed threads compete for shared resources. Step it back to 24 threads and 12 threads, send log_summary for each.<br>
<br><div class="gmail_quote">On Wed, Oct 3, 2012 at 8:08 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><div class="im">
    <div>On 2/10/2012 2:43 PM, Jed Brown wrote:<br>
    </div>
    <blockquote type="cite">On Tue, Oct 2, 2012 at 8:35 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
      wrote:<br>
      <div class="gmail_quote">
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000">
            <div>Hi,<br>
              <br>
              I have combined the momentum linear eqns involving x,y,z
              into 1 large matrix. The Poisson eqn is solved using HYPRE
              strcut format so it's not included. I run the code for 50
              timesteps (hence 50 kspsolve) using 96 procs. The
              log_summary is given below. I have some questions:<br>
              <br>
              1. After combining the matrix, I should have only 1 PETSc
              matrix. Why does it says there are 4 matrix, 12 vector
              etc? <br>
            </div>
          </div>
        </blockquote>
        <div><br>
        </div>
        <div>They are part of preconditioning. Are you sure you're using
          Hypre for this? It looks like you are using bjacobi/ilu.</div>
        <div> </div>
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000">
            <div> <br>
              2. I'm looking at the stages which take the longest time.
              It seems that MatAssemblyBegin, VecNorm, VecAssemblyBegin,
              VecScatterEnd have very high ratios. The ratios of some
              others are also not too good (~ 1.6 - 2). So are these
              stages the reason why my code is not scaling well? What
              can I do to improve it?<br>
            </div>
          </div>
        </blockquote>
        <div><br>
        </div>
        <div>3/4 of the solve time is evenly balanced between MatMult,
          MatSolve, MatLUFactorNumeric, and VecNorm+VecDot.</div>
        <div><br>
        </div>
        <div>The high VecAssembly time might be due to generating a lot
          of entries off-process?</div>
        <div><br>
        </div>
        <div>In any case, this looks like an _extremely_ slow network,
          perhaps it's misconfigured?</div>
      </div>
    </blockquote>
    <br></div>
    My cluster is configured with 48 procs per node. I re-run the case,
    using only 48 procs, thus there's no need to pass over a 'slow'
    interconnect. I'm now also using GAMG and BCGS for the poisson and
    momentum eqn respectively. I have also separated the x,y,z component
    of the momentum eqn to 3 separate linear eqns to debug the problem.
    <br>
    <br>
    Results show that stage "momentum_z" is taking a lot of time. I
    wonder if it has to do with the fact that I am partitioning my grids
    in the z direction. VecScatterEnd, MatMult are taking a lot of time.
    VecNormalize, VecScatterEnd, VecNorm, VecAssemblyBegin 's ratio are
    also not good.<br>
    <br>
    I wonder why a lot of entries are generated off-process.<br>
    <br>
    I create my RHS vector using:<br>
    <br>
    <i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
    <br>
    where ijk_xyz_sta and ijk_xyz_end are obtained from<br>
    <br>
    <i>call MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>
    <br>
    I then insert the values into the vector using:<br>
    <br>
    <i>call VecSetValues(b_rhs_semi_z , ijk_xyz_end - ijk_xyz_sta ,
      (/ijk_xyz_sta : ijk_xyz_end - 1/) , q_semi_vect_z(ijk_xyz_sta + 1
      : ijk_xyz_end) , INSERT_VALUES , ierr)</i><br>
    <br>
    What should I do to correct the problem?<br>
    <br>
    Thanks<div><div class="h5"><br>
    <br>
    <blockquote type="cite">
      <div class="gmail_quote">
        <div> </div>
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000">
            <div> <br>
              Btw, I insert matrix using:<br>
              <br>
              <i>do ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>
              </i><i><br>
              </i><i>    II = ijk - 1</i><i>    !Fortran shift to
                0-based</i><i><br>
              </i><i>    </i><i><br>
              </i><i>    call
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>
              </i><i><br>
              </i><i>end do</i><br>
              <br>
              where ijk_xyz_sta/ijk_xyz_end are the starting/end index<br>
              <br>
              int_semi_xyz(ijk,1:7) stores the 7 column global indices<br>
              <br>
              semi_mat_xyz has the corresponding values.<br>
              <br>
              and I insert vectors using:<br>
              <br>
              call
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>
              <br>
              Thanks!<br>
              <br>
              <i><br>
              </i><br>
              <pre cols="72">Yours sincerely,

TAY wee-beng</pre>
              <div>
                <div> On 30/9/2012 11:30 PM, Jed Brown wrote:<br>
                </div>
              </div>
            </div>
            <div>
              <div>
                <blockquote type="cite">
                  <p>You can measure the time spent in Hypre via PCApply
                    and PCSetUp, but you can't get finer grained
                    integrated profiling because it was not set up that
                    way.</p>
                  <div class="gmail_quote">On Sep 30, 2012 3:26 PM, "TAY
                    wee-beng" <<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>

                    wrote:<br type="attribution">
                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div bgcolor="#FFFFFF" text="#000000">
                        <div>On 27/9/2012 1:44 PM, Matthew Knepley
                          wrote:<br>
                        </div>
                        <blockquote type="cite">On Thu, Sep 27, 2012 at
                          3:49 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
                          wrote:<br>
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi,<br>
                              <br>
                              I'm doing a log summary for my 3d cfd
                              code. I have some questions:<br>
                              <br>
                              1. if I'm solving 3 linear equations using
                              ksp, is the result given in the log
                              summary the total of the 3 linear eqns'
                              performance? How can I get the performance
                              for each individual eqn?<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Use logging stages: <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>

                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 2. If I run
                              my code for 10 time steps, does the log
                              summary gives the total or avg
                              performance/ratio?<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Total.</div>
                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 3. Besides
                              PETSc, I'm also using HYPRE's native
                              geometric MG (Struct) to solve my
                              Cartesian's grid CFD poisson eqn. Is there
                              any way I can use PETSc's log summary to
                              get HYPRE's performance? If I use
                              boomerAMG thru PETSc, can I get its
                              performance?</blockquote>
                            <div><br>
                            </div>
                            <div>If you mean flops, only if you count
                              them yourself and tell PETSc using <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>

                            <div><br>
                            </div>
                            <div>This is the disadvantage of using
                              packages that do not properly monitor
                              things :)</div>
                            <div><br>
                            </div>
                            <div>    Matt</div>
                            <div> </div>
                          </div>
                        </blockquote>
                        So u mean if I use boomerAMG thru PETSc, there
                        is no proper way of evaluating its performance,
                        beside using PetscLogFlops?<br>
                        <blockquote type="cite">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <span><font color="#888888"><br>
                                  -- <br>
                                  Yours sincerely,<br>
                                  <br>
                                  TAY wee-beng<br>
                                  <br>
                                </font></span></blockquote>
                          </div>
                          <br>
                          <br clear="all">
                          <div><br>
                          </div>
                          -- <br>
                          What most experimenters take for granted
                          before they begin their experiments is
                          infinitely more interesting than any results
                          to which their experiments lead.<br>
                          -- Norbert Wiener<br>
                        </blockquote>
                        <br>
                      </div>
                    </blockquote>
                  </div>
                </blockquote>
                <br>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br>