<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 4/10/2012 3:40 AM, Matthew Knepley
wrote:<br>
</div>
<blockquote
cite="mid:CAMYG4Gm0CmONQ_3dnP_m4wiaV=K=uO5s3BWp6sRuLrCu_7ScfA@mail.gmail.com"
type="cite">On Wed, Oct 3, 2012 at 4:05 PM, TAY wee-beng <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi Jed,<br>
<br>
I believe they are real cores. Anyway, I have attached the
log summary for the 12/24/48 cores. I re-run a smaller
case because the large problem can't run with 12cores.<br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Okay, look at VecScatterBegin/End for 24 and 48 cores (I am
guessing you have 4 16-core chips, but please figure this
out).</div>
<div>The messages are logged in ScatterBegin, and the time is
logged in ScatterEnd. From 24 to 48 cores the time is cut in
half.</div>
<div>If you were only communicating the boundary, this is
completely backwards, so you are communicating a fair fraction
of ALL</div>
<div>the values in a subdomain. Figure out why your partition is
so screwed up and this will go away.</div>
</div>
</blockquote>
<br>
What do you mean by "If you were only communicating the boundary,
this is completely backwards, so you are communicating a fair
fraction of ALL the values in a subdomain"?<br>
<br>
I partition my domain in the z direction, as shown in the attached
pic. The circled region is where the airfoils are. I'm using an
immersed boundary method (IBM) code so the grid is all Cartesian.<br>
<br>
I created my Z matrix using:<br>
<br>
call
MatCreateAIJ(MPI_COMM_WORLD,ijk_end-ijk_sta,ijk_end-ijk_sta,PETSC_DECIDE,PETSC_DECIDE,7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,A_semi_z,ierr)<br>
<br>
where ijk_sta / ijk_end are the starting/ending global indices of
the row.<br>
<br>
7 is because the star-stencil is used in 3D.<br>
<br>
I create my RHS vector using:<br>
<br>
<i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_end-ijk_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
<br>
<div>The values for the matrix and vector were calculated before
PETSc logging so they don't come into play.<br>
<br>
They are also done in a similar fashion for matrix x and y. I
still can't get it why solving the z momentum eqn takes so much
time. Which portion should I focus on?<br>
<br>
Tks!<br>
<br>
</div>
<blockquote
cite="mid:CAMYG4Gm0CmONQ_3dnP_m4wiaV=K=uO5s3BWp6sRuLrCu_7ScfA@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>
<pre cols="72">Yours sincerely,
TAY wee-beng</pre>
<div>
<div class="h5"> On 3/10/2012 5:59 PM, Jed Brown wrote:<br>
</div>
</div>
</div>
<div>
<div class="h5">
<blockquote type="cite">There is an inordinate amount of
time being spent in VecScatterEnd(). That sometimes
indicates a very bad partition. Also, are your "48
cores" real physical cores or just "logical cores"
(look like cores to the operating system, usually
advertised as "threads" by the vendor, nothing like
cores in reality)? That can cause a huge load
imbalance and very confusing results as
over-subscribed threads compete for shared resources.
Step it back to 24 threads and 12 threads, send
log_summary for each.<br>
<br>
<div class="gmail_quote">On Wed, Oct 3, 2012 at 8:08
AM, TAY wee-beng <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>
<div>On 2/10/2012 2:43 PM, Jed Brown wrote:<br>
</div>
<blockquote type="cite">On Tue, Oct 2, 2012 at
8:35 AM, TAY wee-beng <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi,<br>
<br>
I have combined the momentum linear
eqns involving x,y,z into 1 large
matrix. The Poisson eqn is solved
using HYPRE strcut format so it's
not included. I run the code for 50
timesteps (hence 50 kspsolve) using
96 procs. The log_summary is given
below. I have some questions:<br>
<br>
1. After combining the matrix, I
should have only 1 PETSc matrix. Why
does it says there are 4 matrix, 12
vector etc? <br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>They are part of preconditioning. Are
you sure you're using Hypre for this? It
looks like you are using bjacobi/ilu.</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div> <br>
2. I'm looking at the stages which
take the longest time. It seems that
MatAssemblyBegin, VecNorm,
VecAssemblyBegin, VecScatterEnd have
very high ratios. The ratios of some
others are also not too good (~ 1.6
- 2). So are these stages the reason
why my code is not scaling well?
What can I do to improve it?<br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>3/4 of the solve time is evenly
balanced between MatMult, MatSolve,
MatLUFactorNumeric, and VecNorm+VecDot.</div>
<div><br>
</div>
<div>The high VecAssembly time might be
due to generating a lot of entries
off-process?</div>
<div><br>
</div>
<div>In any case, this looks like an
_extremely_ slow network, perhaps it's
misconfigured?</div>
</div>
</blockquote>
<br>
</div>
My cluster is configured with 48 procs per node.
I re-run the case, using only 48 procs, thus
there's no need to pass over a 'slow'
interconnect. I'm now also using GAMG and BCGS
for the poisson and momentum eqn respectively. I
have also separated the x,y,z component of the
momentum eqn to 3 separate linear eqns to debug
the problem. <br>
<br>
Results show that stage "momentum_z" is taking a
lot of time. I wonder if it has to do with the
fact that I am partitioning my grids in the z
direction. VecScatterEnd, MatMult are taking a
lot of time. VecNormalize, VecScatterEnd,
VecNorm, VecAssemblyBegin 's ratio are also not
good.<br>
<br>
I wonder why a lot of entries are generated
off-process.<br>
<br>
I create my RHS vector using:<br>
<br>
<i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
<br>
where ijk_xyz_sta and ijk_xyz_end are obtained
from<br>
<br>
<i>call
MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>
<br>
I then insert the values into the vector using:<br>
<br>
<i>call VecSetValues(b_rhs_semi_z , ijk_xyz_end
- ijk_xyz_sta , (/ijk_xyz_sta : ijk_xyz_end -
1/) , q_semi_vect_z(ijk_xyz_sta + 1 :
ijk_xyz_end) , INSERT_VALUES , ierr)</i><br>
<br>
What should I do to correct the problem?<br>
<br>
Thanks
<div>
<div><br>
<br>
<blockquote type="cite">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div> <br>
Btw, I insert matrix using:<br>
<br>
<i>do
ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>
</i><i><br>
</i><i> II = ijk - 1</i><i>
!Fortran shift to 0-based</i><i><br>
</i><i> </i><i><br>
</i><i> call
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>
</i><i><br>
</i><i>end do</i><br>
<br>
where ijk_xyz_sta/ijk_xyz_end are
the starting/end index<br>
<br>
int_semi_xyz(ijk,1:7) stores the 7
column global indices<br>
<br>
semi_mat_xyz has the corresponding
values.<br>
<br>
and I insert vectors using:<br>
<br>
call
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>
<br>
Thanks!<br>
<br>
<i><br>
</i><br>
<pre cols="72">Yours sincerely,
TAY wee-beng</pre>
<div>
<div> On 30/9/2012 11:30 PM, Jed
Brown wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote type="cite">
<p>You can measure the time
spent in Hypre via PCApply
and PCSetUp, but you can't
get finer grained integrated
profiling because it was not
set up that way.</p>
<div class="gmail_quote">On
Sep 30, 2012 3:26 PM, "TAY
wee-beng" <<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>>
wrote:<br type="attribution">
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF"
text="#000000">
<div>On 27/9/2012 1:44
PM, Matthew Knepley
wrote:<br>
</div>
<blockquote type="cite">On
Thu, Sep 27, 2012 at
3:49 AM, TAY wee-beng
<span dir="ltr"><<a
moz-do-not-send="true" href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div
class="gmail_quote">
<blockquote
class="gmail_quote"
style="margin:0 0
0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
Hi,<br>
<br>
I'm doing a log
summary for my 3d
cfd code. I have
some questions:<br>
<br>
1. if I'm solving
3 linear equations
using ksp, is the
result given in
the log summary
the total of the 3
linear eqns'
performance? How
can I get the
performance for
each individual
eqn?<br>
</blockquote>
<div><br>
</div>
<div>Use logging
stages: <a
moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html"
target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0
0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
2. If I run my
code for 10 time
steps, does the
log summary gives
the total or avg
performance/ratio?<br>
</blockquote>
<div><br>
</div>
<div>Total.</div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0
0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
3. Besides PETSc,
I'm also using
HYPRE's native
geometric MG
(Struct) to solve
my Cartesian's
grid CFD poisson
eqn. Is there any
way I can use
PETSc's log
summary to get
HYPRE's
performance? If I
use boomerAMG thru
PETSc, can I get
its performance?</blockquote>
<div><br>
</div>
<div>If you mean
flops, only if you
count them
yourself and tell
PETSc using <a
moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html"
target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>
<div><br>
</div>
<div>This is the
disadvantage of
using packages
that do not
properly monitor
things :)</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
</div>
</blockquote>
So u mean if I use
boomerAMG thru PETSc,
there is no proper way
of evaluating its
performance, beside
using PetscLogFlops?<br>
<blockquote type="cite">
<div
class="gmail_quote">
<blockquote
class="gmail_quote"
style="margin:0 0
0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<span><font
color="#888888"><br>
-- <br>
Yours
sincerely,<br>
<br>
TAY wee-beng<br>
<br>
</font></span></blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
What most
experimenters take for
granted before they
begin their
experiments is
infinitely more
interesting than any
results to which their
experiments lead.<br>
-- Norbert Wiener<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to
which their experiments lead.<br>
-- Norbert Wiener<br>
</blockquote>
<br>
</body>
</html>