On Wed, Oct 3, 2012 at 4:05 PM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    <div>Hi Jed,<br>
      <br>
      I believe they are real cores. Anyway, I have attached the log
      summary for the 12/24/48 cores. I re-run a smaller case because
      the large problem can't run with 12cores.<br>
      <pre cols="72"></pre></div></div></blockquote><div><br></div><div>Okay, look at VecScatterBegin/End for 24 and 48 cores (I am guessing you have 4 16-core chips, but please figure this out).</div><div>The messages are logged in ScatterBegin, and the time is logged in ScatterEnd. From 24 to 48 cores the time is cut in half.</div>
<div>If you were only communicating the boundary, this is completely backwards, so you are communicating a fair fraction of ALL</div><div>the values in a subdomain. Figure out why your partition is so screwed up and this will go away.</div>
<div><br></div><div>   Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div><pre cols="72">Yours sincerely,

TAY wee-beng</pre><div><div class="h5">
      On 3/10/2012 5:59 PM, Jed Brown wrote:<br>
    </div></div></div><div><div class="h5">
    <blockquote type="cite">There is an inordinate amount of time being spent in
      VecScatterEnd(). That sometimes indicates a very bad partition.
      Also, are your "48 cores" real physical cores or just "logical
      cores" (look like cores to the operating system, usually
      advertised as "threads" by the vendor, nothing like cores in
      reality)? That can cause a huge load imbalance and very confusing
      results as over-subscribed threads compete for shared resources.
      Step it back to 24 threads and 12 threads, send log_summary for
      each.<br>
      <br>
      <div class="gmail_quote">On Wed, Oct 3, 2012 at 8:08 AM, TAY
        wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
        wrote:<br>
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000">
            <div>
              <div>On 2/10/2012 2:43 PM, Jed Brown wrote:<br>
              </div>
              <blockquote type="cite">On Tue, Oct 2, 2012 at 8:35 AM,
                TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
                wrote:<br>
                <div class="gmail_quote">
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <div bgcolor="#FFFFFF" text="#000000">
                      <div>Hi,<br>
                        <br>
                        I have combined the momentum linear eqns
                        involving x,y,z into 1 large matrix. The Poisson
                        eqn is solved using HYPRE strcut format so it's
                        not included. I run the code for 50 timesteps
                        (hence 50 kspsolve) using 96 procs. The
                        log_summary is given below. I have some
                        questions:<br>
                        <br>
                        1. After combining the matrix, I should have
                        only 1 PETSc matrix. Why does it says there are
                        4 matrix, 12 vector etc? <br>
                      </div>
                    </div>
                  </blockquote>
                  <div><br>
                  </div>
                  <div>They are part of preconditioning. Are you sure
                    you're using Hypre for this? It looks like you are
                    using bjacobi/ilu.</div>
                  <div> </div>
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <div bgcolor="#FFFFFF" text="#000000">
                      <div> <br>
                        2. I'm looking at the stages which take the
                        longest time. It seems that MatAssemblyBegin,
                        VecNorm, VecAssemblyBegin, VecScatterEnd have
                        very high ratios. The ratios of some others are
                        also not too good (~ 1.6 - 2). So are these
                        stages the reason why my code is not scaling
                        well? What can I do to improve it?<br>
                      </div>
                    </div>
                  </blockquote>
                  <div><br>
                  </div>
                  <div>3/4 of the solve time is evenly balanced between
                    MatMult, MatSolve, MatLUFactorNumeric, and
                    VecNorm+VecDot.</div>
                  <div><br>
                  </div>
                  <div>The high VecAssembly time might be due to
                    generating a lot of entries off-process?</div>
                  <div><br>
                  </div>
                  <div>In any case, this looks like an _extremely_ slow
                    network, perhaps it's misconfigured?</div>
                </div>
              </blockquote>
              <br>
            </div>
            My cluster is configured with 48 procs per node. I re-run
            the case, using only 48 procs, thus there's no need to pass
            over a 'slow' interconnect. I'm now also using GAMG and BCGS
            for the poisson and momentum eqn respectively. I have also
            separated the x,y,z component of the momentum eqn to 3
            separate linear eqns to debug the problem. <br>
            <br>
            Results show that stage "momentum_z" is taking a lot of
            time. I wonder if it has to do with the fact that I am
            partitioning my grids in the z direction. VecScatterEnd,
            MatMult are taking a lot of time. VecNormalize,
            VecScatterEnd, VecNorm, VecAssemblyBegin 's ratio are also
            not good.<br>
            <br>
            I wonder why a lot of entries are generated off-process.<br>
            <br>
            I create my RHS vector using:<br>
            <br>
            <i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
            <br>
            where ijk_xyz_sta and ijk_xyz_end are obtained from<br>
            <br>
            <i>call
              MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>
            <br>
            I then insert the values into the vector using:<br>
            <br>
            <i>call VecSetValues(b_rhs_semi_z , ijk_xyz_end -
              ijk_xyz_sta , (/ijk_xyz_sta : ijk_xyz_end - 1/) ,
              q_semi_vect_z(ijk_xyz_sta + 1 : ijk_xyz_end) ,
              INSERT_VALUES , ierr)</i><br>
            <br>
            What should I do to correct the problem?<br>
            <br>
            Thanks
            <div>
              <div><br>
                <br>
                <blockquote type="cite">
                  <div class="gmail_quote">
                    <div> </div>
                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div bgcolor="#FFFFFF" text="#000000">
                        <div> <br>
                          Btw, I insert matrix using:<br>
                          <br>
                          <i>do ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>
                          </i><i><br>
                          </i><i>    II = ijk - 1</i><i>    !Fortran
                            shift to 0-based</i><i><br>
                          </i><i>    </i><i><br>
                          </i><i>    call
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>
                          </i><i><br>
                          </i><i>end do</i><br>
                          <br>
                          where ijk_xyz_sta/ijk_xyz_end are the
                          starting/end index<br>
                          <br>
                          int_semi_xyz(ijk,1:7) stores the 7 column
                          global indices<br>
                          <br>
                          semi_mat_xyz has the corresponding values.<br>
                          <br>
                          and I insert vectors using:<br>
                          <br>
                          call
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>
                          <br>
                          Thanks!<br>
                          <br>
                          <i><br>
                          </i><br>
                          <pre cols="72">Yours sincerely,

TAY wee-beng</pre>
                          <div>
                            <div> On 30/9/2012 11:30 PM, Jed Brown
                              wrote:<br>
                            </div>
                          </div>
                        </div>
                        <div>
                          <div>
                            <blockquote type="cite">
                              <p>You can measure the time spent in Hypre
                                via PCApply and PCSetUp, but you can't
                                get finer grained integrated profiling
                                because it was not set up that way.</p>
                              <div class="gmail_quote">On Sep 30, 2012
                                3:26 PM, "TAY wee-beng" <<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>


                                wrote:<br type="attribution">
                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                  <div bgcolor="#FFFFFF" text="#000000">
                                    <div>On 27/9/2012 1:44 PM, Matthew
                                      Knepley wrote:<br>
                                    </div>
                                    <blockquote type="cite">On Thu, Sep
                                      27, 2012 at 3:49 AM, TAY wee-beng
                                      <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
                                      wrote:<br>
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi,<br>
                                          <br>
                                          I'm doing a log summary for my
                                          3d cfd code. I have some
                                          questions:<br>
                                          <br>
                                          1. if I'm solving 3 linear
                                          equations using ksp, is the
                                          result given in the log
                                          summary the total of the 3
                                          linear eqns' performance? How
                                          can I get the performance for
                                          each individual eqn?<br>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>Use logging stages: <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>

                                        <div> </div>
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 2. If
                                          I run my code for 10 time
                                          steps, does the log summary
                                          gives the total or avg
                                          performance/ratio?<br>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>Total.</div>
                                        <div> </div>
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 3.
                                          Besides PETSc, I'm also using
                                          HYPRE's native geometric MG
                                          (Struct) to solve my
                                          Cartesian's grid CFD poisson
                                          eqn. Is there any way I can
                                          use PETSc's log summary to get
                                          HYPRE's performance? If I use
                                          boomerAMG thru PETSc, can I
                                          get its performance?</blockquote>
                                        <div><br>
                                        </div>
                                        <div>If you mean flops, only if
                                          you count them yourself and
                                          tell PETSc using <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>

                                        <div><br>
                                        </div>
                                        <div>This is the disadvantage of
                                          using packages that do not
                                          properly monitor things :)</div>
                                        <div><br>
                                        </div>
                                        <div>    Matt</div>
                                        <div> </div>
                                      </div>
                                    </blockquote>
                                    So u mean if I use boomerAMG thru
                                    PETSc, there is no proper way of
                                    evaluating its performance, beside
                                    using PetscLogFlops?<br>
                                    <blockquote type="cite">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <span><font color="#888888"><br>
                                              -- <br>
                                              Yours sincerely,<br>
                                              <br>
                                              TAY wee-beng<br>
                                              <br>
                                            </font></span></blockquote>
                                      </div>
                                      <br>
                                      <br clear="all">
                                      <div><br>
                                      </div>
                                      -- <br>
                                      What most experimenters take for
                                      granted before they begin their
                                      experiments is infinitely more
                                      interesting than any results to
                                      which their experiments lead.<br>
                                      -- Norbert Wiener<br>
                                    </blockquote>
                                    <br>
                                  </div>
                                </blockquote>
                              </div>
                            </blockquote>
                            <br>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </blockquote>
                <br>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>