There is an inordinate amount of time being spent in VecScatterEnd(). That sometimes indicates a very bad partition. Also, are your "48 cores" real physical cores or just "logical cores" (look like cores to the operating system, usually advertised as "threads" by the vendor, nothing like cores in reality)? That can cause a huge load imbalance and very confusing results as over-subscribed threads compete for shared resources. Step it back to 24 threads and 12 threads, send log_summary for each.<br>
<br><div class="gmail_quote">On Wed, Oct 3, 2012 at 8:08 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div class="im">
<div>On 2/10/2012 2:43 PM, Jed Brown wrote:<br>
</div>
<blockquote type="cite">On Tue, Oct 2, 2012 at 8:35 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi,<br>
<br>
I have combined the momentum linear eqns involving x,y,z
into 1 large matrix. The Poisson eqn is solved using HYPRE
strcut format so it's not included. I run the code for 50
timesteps (hence 50 kspsolve) using 96 procs. The
log_summary is given below. I have some questions:<br>
<br>
1. After combining the matrix, I should have only 1 PETSc
matrix. Why does it says there are 4 matrix, 12 vector
etc? <br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>They are part of preconditioning. Are you sure you're using
Hypre for this? It looks like you are using bjacobi/ilu.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div> <br>
2. I'm looking at the stages which take the longest time.
It seems that MatAssemblyBegin, VecNorm, VecAssemblyBegin,
VecScatterEnd have very high ratios. The ratios of some
others are also not too good (~ 1.6 - 2). So are these
stages the reason why my code is not scaling well? What
can I do to improve it?<br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>3/4 of the solve time is evenly balanced between MatMult,
MatSolve, MatLUFactorNumeric, and VecNorm+VecDot.</div>
<div><br>
</div>
<div>The high VecAssembly time might be due to generating a lot
of entries off-process?</div>
<div><br>
</div>
<div>In any case, this looks like an _extremely_ slow network,
perhaps it's misconfigured?</div>
</div>
</blockquote>
<br></div>
My cluster is configured with 48 procs per node. I re-run the case,
using only 48 procs, thus there's no need to pass over a 'slow'
interconnect. I'm now also using GAMG and BCGS for the poisson and
momentum eqn respectively. I have also separated the x,y,z component
of the momentum eqn to 3 separate linear eqns to debug the problem.
<br>
<br>
Results show that stage "momentum_z" is taking a lot of time. I
wonder if it has to do with the fact that I am partitioning my grids
in the z direction. VecScatterEnd, MatMult are taking a lot of time.
VecNormalize, VecScatterEnd, VecNorm, VecAssemblyBegin 's ratio are
also not good.<br>
<br>
I wonder why a lot of entries are generated off-process.<br>
<br>
I create my RHS vector using:<br>
<br>
<i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
<br>
where ijk_xyz_sta and ijk_xyz_end are obtained from<br>
<br>
<i>call MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>
<br>
I then insert the values into the vector using:<br>
<br>
<i>call VecSetValues(b_rhs_semi_z , ijk_xyz_end - ijk_xyz_sta ,
(/ijk_xyz_sta : ijk_xyz_end - 1/) , q_semi_vect_z(ijk_xyz_sta + 1
: ijk_xyz_end) , INSERT_VALUES , ierr)</i><br>
<br>
What should I do to correct the problem?<br>
<br>
Thanks<div><div class="h5"><br>
<br>
<blockquote type="cite">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div> <br>
Btw, I insert matrix using:<br>
<br>
<i>do ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>
</i><i><br>
</i><i> II = ijk - 1</i><i> !Fortran shift to
0-based</i><i><br>
</i><i> </i><i><br>
</i><i> call
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>
</i><i><br>
</i><i>end do</i><br>
<br>
where ijk_xyz_sta/ijk_xyz_end are the starting/end index<br>
<br>
int_semi_xyz(ijk,1:7) stores the 7 column global indices<br>
<br>
semi_mat_xyz has the corresponding values.<br>
<br>
and I insert vectors using:<br>
<br>
call
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>
<br>
Thanks!<br>
<br>
<i><br>
</i><br>
<pre cols="72">Yours sincerely,
TAY wee-beng</pre>
<div>
<div> On 30/9/2012 11:30 PM, Jed Brown wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote type="cite">
<p>You can measure the time spent in Hypre via PCApply
and PCSetUp, but you can't get finer grained
integrated profiling because it was not set up that
way.</p>
<div class="gmail_quote">On Sep 30, 2012 3:26 PM, "TAY
wee-beng" <<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>On 27/9/2012 1:44 PM, Matthew Knepley
wrote:<br>
</div>
<blockquote type="cite">On Thu, Sep 27, 2012 at
3:49 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi,<br>
<br>
I'm doing a log summary for my 3d cfd
code. I have some questions:<br>
<br>
1. if I'm solving 3 linear equations using
ksp, is the result given in the log
summary the total of the 3 linear eqns'
performance? How can I get the performance
for each individual eqn?<br>
</blockquote>
<div><br>
</div>
<div>Use logging stages: <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 2. If I run
my code for 10 time steps, does the log
summary gives the total or avg
performance/ratio?<br>
</blockquote>
<div><br>
</div>
<div>Total.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 3. Besides
PETSc, I'm also using HYPRE's native
geometric MG (Struct) to solve my
Cartesian's grid CFD poisson eqn. Is there
any way I can use PETSc's log summary to
get HYPRE's performance? If I use
boomerAMG thru PETSc, can I get its
performance?</blockquote>
<div><br>
</div>
<div>If you mean flops, only if you count
them yourself and tell PETSc using <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>
<div><br>
</div>
<div>This is the disadvantage of using
packages that do not properly monitor
things :)</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
</div>
</blockquote>
So u mean if I use boomerAMG thru PETSc, there
is no proper way of evaluating its performance,
beside using PetscLogFlops?<br>
<blockquote type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <span><font color="#888888"><br>
-- <br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
</font></span></blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
What most experimenters take for granted
before they begin their experiments is
infinitely more interesting than any results
to which their experiments lead.<br>
-- Norbert Wiener<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br>