There is an inordinate amount of time being spent in VecScatterEnd(). That sometimes indicates a very bad partition. Also, are your "48 cores" real physical cores or just "logical cores" (look like cores to the operating system, usually advertised as "threads" by the vendor, nothing like cores in reality)? That can cause a huge load imbalance and very confusing results as over-subscribed threads compete for shared resources. Step it back to 24 threads and 12 threads, send log_summary for each.<br>

<br><div class="gmail_quote">On Wed, Oct 3, 2012 at 8:08 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><div class="im">

    <div>On 2/10/2012 2:43 PM, Jed Brown wrote:<br>

    </div>

    <blockquote type="cite">On Tue, Oct 2, 2012 at 8:35 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

      wrote:<br>

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000">

            <div>Hi,<br>

              <br>

              I have combined the momentum linear eqns involving x,y,z

              into 1 large matrix. The Poisson eqn is solved using HYPRE

              strcut format so it's not included. I run the code for 50

              timesteps (hence 50 kspsolve) using 96 procs. The

              log_summary is given below. I have some questions:<br>

              <br>

              1. After combining the matrix, I should have only 1 PETSc

              matrix. Why does it says there are 4 matrix, 12 vector

              etc? <br>

            </div>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>They are part of preconditioning. Are you sure you're using

          Hypre for this? It looks like you are using bjacobi/ilu.</div>

        <div> </div>

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000">

            <div> <br>

              2. I'm looking at the stages which take the longest time.

              It seems that MatAssemblyBegin, VecNorm, VecAssemblyBegin,

              VecScatterEnd have very high ratios. The ratios of some

              others are also not too good (~ 1.6 - 2). So are these

              stages the reason why my code is not scaling well? What

              can I do to improve it?<br>

            </div>

          </div>

        </blockquote>

        <div><br>

        </div>

        <div>3/4 of the solve time is evenly balanced between MatMult,

          MatSolve, MatLUFactorNumeric, and VecNorm+VecDot.</div>

        <div><br>

        </div>

        <div>The high VecAssembly time might be due to generating a lot

          of entries off-process?</div>

        <div><br>

        </div>

        <div>In any case, this looks like an _extremely_ slow network,

          perhaps it's misconfigured?</div>

      </div>

    </blockquote>

    <br></div>

    My cluster is configured with 48 procs per node. I re-run the case,

    using only 48 procs, thus there's no need to pass over a 'slow'

    interconnect. I'm now also using GAMG and BCGS for the poisson and

    momentum eqn respectively. I have also separated the x,y,z component

    of the momentum eqn to 3 separate linear eqns to debug the problem.

    <br>

    <br>

    Results show that stage "momentum_z" is taking a lot of time. I

    wonder if it has to do with the fact that I am partitioning my grids

    in the z direction. VecScatterEnd, MatMult are taking a lot of time.

    VecNormalize, VecScatterEnd, VecNorm, VecAssemblyBegin 's ratio are

    also not good.<br>

    <br>

    I wonder why a lot of entries are generated off-process.<br>

    <br>

    I create my RHS vector using:<br>

    <br>

    <i>call

VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>

    <br>

    where ijk_xyz_sta and ijk_xyz_end are obtained from<br>

    <br>

    <i>call MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>

    <br>

    I then insert the values into the vector using:<br>

    <br>

    <i>call VecSetValues(b_rhs_semi_z , ijk_xyz_end - ijk_xyz_sta ,

      (/ijk_xyz_sta : ijk_xyz_end - 1/) , q_semi_vect_z(ijk_xyz_sta + 1

      : ijk_xyz_end) , INSERT_VALUES , ierr)</i><br>

    <br>

    What should I do to correct the problem?<br>

    <br>

    Thanks<div><div class="h5"><br>

    <br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <div> </div>

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000">

            <div> <br>

              Btw, I insert matrix using:<br>

              <br>

              <i>do ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>

              </i><i><br>

              </i><i>    II = ijk - 1</i><i>    !Fortran shift to

                0-based</i><i><br>

              </i><i>    </i><i><br>

              </i><i>    call

MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>

              </i><i><br>

              </i><i>end do</i><br>

              <br>

              where ijk_xyz_sta/ijk_xyz_end are the starting/end index<br>

              <br>

              int_semi_xyz(ijk,1:7) stores the 7 column global indices<br>

              <br>

              semi_mat_xyz has the corresponding values.<br>

              <br>

              and I insert vectors using:<br>

              <br>

              call

VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>

              <br>

              Thanks!<br>

              <br>

              <i><br>

              </i><br>

              <pre cols="72">Yours sincerely,

TAY wee-beng</pre>

              <div>

                <div> On 30/9/2012 11:30 PM, Jed Brown wrote:<br>

                </div>

              </div>

            </div>

            <div>

              <div>

                <blockquote type="cite">

                  <p>You can measure the time spent in Hypre via PCApply

                    and PCSetUp, but you can't get finer grained

                    integrated profiling because it was not set up that

                    way.</p>

                  <div class="gmail_quote">On Sep 30, 2012 3:26 PM, "TAY

                    wee-beng" <<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>

                    wrote:<br type="attribution">

                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div bgcolor="#FFFFFF" text="#000000">

                        <div>On 27/9/2012 1:44 PM, Matthew Knepley

                          wrote:<br>

                        </div>

                        <blockquote type="cite">On Thu, Sep 27, 2012 at

                          3:49 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>

                          wrote:<br>

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi,<br>

                              <br>

                              I'm doing a log summary for my 3d cfd

                              code. I have some questions:<br>

                              <br>

                              1. if I'm solving 3 linear equations using

                              ksp, is the result given in the log

                              summary the total of the 3 linear eqns'

                              performance? How can I get the performance

                              for each individual eqn?<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Use logging stages: <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>

                            <div> </div>

                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 2. If I run

                              my code for 10 time steps, does the log

                              summary gives the total or avg

                              performance/ratio?<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Total.</div>

                            <div> </div>

                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 3. Besides

                              PETSc, I'm also using HYPRE's native

                              geometric MG (Struct) to solve my

                              Cartesian's grid CFD poisson eqn. Is there

                              any way I can use PETSc's log summary to

                              get HYPRE's performance? If I use

                              boomerAMG thru PETSc, can I get its

                              performance?</blockquote>

                            <div><br>

                            </div>

                            <div>If you mean flops, only if you count

                              them yourself and tell PETSc using <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>

                            <div><br>

                            </div>

                            <div>This is the disadvantage of using

                              packages that do not properly monitor

                              things :)</div>

                            <div><br>

                            </div>

                            <div>    Matt</div>

                            <div> </div>

                          </div>

                        </blockquote>

                        So u mean if I use boomerAMG thru PETSc, there

                        is no proper way of evaluating its performance,

                        beside using PetscLogFlops?<br>

                        <blockquote type="cite">

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <span><font color="#888888"><br>

                                  -- <br>

                                  Yours sincerely,<br>

                                  <br>

                                  TAY wee-beng<br>

                                  <br>

                                </font></span></blockquote>

                          </div>

                          <br>

                          <br clear="all">

                          <div><br>

                          </div>

                          -- <br>

                          What most experimenters take for granted

                          before they begin their experiments is

                          infinitely more interesting than any results

                          to which their experiments lead.<br>

                          -- Norbert Wiener<br>

                        </blockquote>

                        <br>

                      </div>

                    </blockquote>

                  </div>

                </blockquote>

                <br>

              </div>

            </div>

          </div>

        </blockquote>

      </div>

      <br>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br>