Can you send a picture of what your domain looks like and what shape the part owned by a given processor looks like? Best would be to write out the mesh with a variable marking the rank owning each vertex, then do a color plot in Paraview or whatever you use to show the partition.<div>
<br></div><div>VecScatterBegin/End is taking much more time than these, and really a pretty unreasonable amount of time in general.<br><br><div class="gmail_quote">On Thu, Oct 4, 2012 at 2:16 PM, Wee-Beng Tay <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div class="im">
<div>On 4/10/2012 5:11 PM, Matthew Knepley
wrote:<br>
</div>
<blockquote type="cite">On Thu, Oct 4, 2012 at 11:01 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>On 4/10/2012 3:40 AM, Matthew Knepley wrote:<br>
</div>
<blockquote type="cite">On Wed, Oct 3, 2012 at 4:05 PM, TAY
wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi Jed,<br>
<br>
I believe they are real cores. Anyway, I have
attached the log summary for the 12/24/48 cores. I
re-run a smaller case because the large problem
can't run with 12cores.<br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Okay, look at VecScatterBegin/End for 24 and 48
cores (I am guessing you have 4 16-core chips, but
please figure this out).</div>
<div>The messages are logged in ScatterBegin, and the
time is logged in ScatterEnd. From 24 to 48 cores the
time is cut in half.</div>
<div>If you were only communicating the boundary, this
is completely backwards, so you are communicating a
fair fraction of ALL</div>
<div>the values in a subdomain. Figure out why your
partition is so screwed up and this will go away.</div>
</div>
</blockquote>
<br>
What do you mean by "If you were only communicating the
boundary, this is completely backwards, so you are
communicating a fair fraction of ALL the values in a
subdomain"?<br>
</div>
</blockquote>
<div><br>
</div>
<div>If you have 48 partitions instead of 24, you have a larger
interface, so AssemblyEnd() should take</div>
<div>slightly longer. However, your AssemblyEnd() takes HALF the
time, which means its communicating</div>
<div>much fewer values, which means you are not sending
interface values, you are sending interior values,</div>
<div>since the interior shrinks when you have more partitions.</div>
<div><br>
</div>
<div>What this probably means is that your assembly routines are
screwed up, and sending data all over the place.</div>
<div><br>
</div>
</div>
</blockquote></div>
Ok I got it now. Looking at the AssemblyEnd time,<br>
<br>
12 procs<br>
<br>
MatAssemblyEnd 145 1.0 1.6342e+01 1.8 0.00e+00 0.0 4.4e+01
6.0e+04 8.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<br>VecAssemblyEnd 388 1.0 1.4472e-03 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<br>
24 procs<br>
<br>
MatAssemblyEnd 145 1.0 1.1618e+01 2.4 0.00e+00 0.0 9.2e+01
6.0e+04 8.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<br>VecAssemblyEnd 388 1.0 2.3527e-03 2.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br><br>
48 procs<br>
<br>
MatAssemblyEnd 145 1.0 7.4327e+00 2.4 0.00e+00 0.0 1.9e+02
6.0e+04 8.0e+00 0 0 0 0 0 0 0 0 0 0 <br>
<br><br>
VecAssemblyEnd 388 1.0 2.8818e-03 3.7 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<br>
VecAssemblyEnd time increases with procs, does it mean that there is nothing wrong with it?<br><br>On the other hand, MatAssemblyEnd time decreases with procs. So that's where the problem lies, is that so? <br><br>
I'm still scanning my code but haven't found the error yet. It seems strange because I inserted the matrix and vector exactly the same way for x,y,z. The u,v,w are also allocated with the same indices. Shouldn't the error be the same for x, y and z too?<br>
<br>Trying to get some hints as to where else I need to look in my code...<br><br>Tks<div><div class="h5"><br><br>
<br>
<br>
<br>
<br>
<br>
<blockquote type="cite">
<div class="gmail_quote">
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> I partition my domain
in the z direction, as shown in the attached pic. The
circled region is where the airfoils are. I'm using an
immersed boundary method (IBM) code so the grid is all
Cartesian.<br>
<br>
I created my Z matrix using:<br>
<br>
call
MatCreateAIJ(MPI_COMM_WORLD,ijk_end-ijk_sta,ijk_end-ijk_sta,PETSC_DECIDE,PETSC_DECIDE,7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,A_semi_z,ierr)<br>
<br>
where ijk_sta / ijk_end are the starting/ending global
indices of the row.<br>
<br>
7 is because the star-stencil is used in 3D.<br>
<br>
I create my RHS vector using:<br>
<br>
<i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_end-ijk_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
<br>
<div>The values for the matrix and vector were calculated
before PETSc logging so they don't come into play.<br>
<br>
They are also done in a similar fashion for matrix x and
y. I still can't get it why solving the z momentum eqn
takes so much time. Which portion should I focus on?<br>
<br>
Tks!<br>
<br>
</div>
<blockquote type="cite">
<div class="gmail_quote">
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>
<pre cols="72">Yours sincerely,
TAY wee-beng</pre>
<div>
<div> On 3/10/2012 5:59 PM, Jed Brown wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote type="cite">There is an inordinate
amount of time being spent in VecScatterEnd().
That sometimes indicates a very bad partition.
Also, are your "48 cores" real physical cores
or just "logical cores" (look like cores to
the operating system, usually advertised as
"threads" by the vendor, nothing like cores in
reality)? That can cause a huge load imbalance
and very confusing results as over-subscribed
threads compete for shared resources. Step it
back to 24 threads and 12 threads, send
log_summary for each.<br>
<br>
<div class="gmail_quote">On Wed, Oct 3, 2012
at 8:08 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>
<div>On 2/10/2012 2:43 PM, Jed Brown
wrote:<br>
</div>
<blockquote type="cite">On Tue, Oct 2,
2012 at 8:35 AM, TAY wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi,<br>
<br>
I have combined the momentum
linear eqns involving x,y,z
into 1 large matrix. The
Poisson eqn is solved using
HYPRE strcut format so it's
not included. I run the code
for 50 timesteps (hence 50
kspsolve) using 96 procs.
The log_summary is given
below. I have some
questions:<br>
<br>
1. After combining the
matrix, I should have only 1
PETSc matrix. Why does it
says there are 4 matrix, 12
vector etc? <br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>They are part of
preconditioning. Are you sure
you're using Hypre for this? It
looks like you are using
bjacobi/ilu.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div> <br>
2. I'm looking at the stages
which take the longest time.
It seems that
MatAssemblyBegin, VecNorm,
VecAssemblyBegin,
VecScatterEnd have very high
ratios. The ratios of some
others are also not too good
(~ 1.6 - 2). So are these
stages the reason why my
code is not scaling well?
What can I do to improve it?<br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>3/4 of the solve time is
evenly balanced between MatMult,
MatSolve, MatLUFactorNumeric,
and VecNorm+VecDot.</div>
<div><br>
</div>
<div>The high VecAssembly time
might be due to generating a lot
of entries off-process?</div>
<div><br>
</div>
<div>In any case, this looks like
an _extremely_ slow network,
perhaps it's misconfigured?</div>
</div>
</blockquote>
<br>
</div>
My cluster is configured with 48 procs
per node. I re-run the case, using only
48 procs, thus there's no need to pass
over a 'slow' interconnect. I'm now also
using GAMG and BCGS for the poisson and
momentum eqn respectively. I have also
separated the x,y,z component of the
momentum eqn to 3 separate linear eqns
to debug the problem. <br>
<br>
Results show that stage "momentum_z" is
taking a lot of time. I wonder if it has
to do with the fact that I am
partitioning my grids in the z
direction. VecScatterEnd, MatMult are
taking a lot of time. VecNormalize,
VecScatterEnd, VecNorm, VecAssemblyBegin
's ratio are also not good.<br>
<br>
I wonder why a lot of entries are
generated off-process.<br>
<br>
I create my RHS vector using:<br>
<br>
<i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
<br>
where ijk_xyz_sta and ijk_xyz_end are
obtained from<br>
<br>
<i>call
MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>
<br>
I then insert the values into the vector
using:<br>
<br>
<i>call VecSetValues(b_rhs_semi_z ,
ijk_xyz_end - ijk_xyz_sta ,
(/ijk_xyz_sta : ijk_xyz_end - 1/) ,
q_semi_vect_z(ijk_xyz_sta + 1 :
ijk_xyz_end) , INSERT_VALUES , ierr)</i><br>
<br>
What should I do to correct the problem?<br>
<br>
Thanks
<div>
<div><br>
<br>
<blockquote type="cite">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div> <br>
Btw, I insert matrix
using:<br>
<br>
<i>do
ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>
</i><i><br>
</i><i> II = ijk - 1</i><i>
!Fortran shift to
0-based</i><i><br>
</i><i> </i><i><br>
</i><i> call
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>
</i><i><br>
</i><i>end do</i><br>
<br>
where
ijk_xyz_sta/ijk_xyz_end
are the starting/end index<br>
<br>
int_semi_xyz(ijk,1:7)
stores the 7 column global
indices<br>
<br>
semi_mat_xyz has the
corresponding values.<br>
<br>
and I insert vectors
using:<br>
<br>
call
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>
<br>
Thanks!<br>
<br>
<i><br>
</i><br>
<pre cols="72">Yours sincerely,
TAY wee-beng</pre>
<div>
<div> On 30/9/2012 11:30
PM, Jed Brown wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote type="cite">
<p>You can measure the
time spent in Hypre
via PCApply and
PCSetUp, but you
can't get finer
grained integrated
profiling because it
was not set up that
way.</p>
<div class="gmail_quote">On
Sep 30, 2012 3:26
PM, "TAY wee-beng"
<<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>On
27/9/2012 1:44
PM, Matthew
Knepley wrote:<br>
</div>
<blockquote type="cite">On
Thu, Sep 27,
2012 at 3:49
AM, TAY
wee-beng <span dir="ltr"><<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
I'm doing a
log summary
for my 3d cfd
code. I have
some
questions:<br>
<br>
1. if I'm
solving 3
linear
equations
using ksp, is
the result
given in the
log summary
the total of
the 3 linear
eqns'
performance?
How can I get
the
performance
for each
individual
eqn?<br>
</blockquote>
<div><br>
</div>
<div>Use
logging
stages: <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
2. If I run my
code for 10
time steps,
does the log
summary gives
the total or
avg
performance/ratio?<br>
</blockquote>
<div><br>
</div>
<div>Total.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
3. Besides
PETSc, I'm
also using
HYPRE's native
geometric MG
(Struct) to
solve my
Cartesian's
grid CFD
poisson eqn.
Is there any
way I can use
PETSc's log
summary to get
HYPRE's
performance?
If I use
boomerAMG thru
PETSc, can I
get its
performance?</blockquote>
<div><br>
</div>
<div>If you
mean flops,
only if you
count them
yourself and
tell PETSc
using <a href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html" target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>
<div><br>
</div>
<div>This is
the
disadvantage
of using
packages that
do not
properly
monitor things
:)</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
</div>
</blockquote>
So u mean if I
use boomerAMG
thru PETSc,
there is no
proper way of
evaluating its
performance,
beside using
PetscLogFlops?<br>
<blockquote type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span><font color="#888888"><br>
-- <br>
Yours
sincerely,<br>
<br>
TAY wee-beng<br>
<br>
</font></span></blockquote>
</div>
<br>
<br clear="all">
<span><font color="#888888">
<div><br>
</div>
-- <br>
What most
experimenters
take for
granted before
they begin
their
experiments is
infinitely
more
interesting
than any
results to
which their
experiments
lead.<br>
-- Norbert
Wiener<br>
</font></span></blockquote>
<span><font color="#888888"> <br>
</font></span></div>
<span><font color="#888888"> </font></span></blockquote>
<span><font color="#888888">
</font></span></div>
<span><font color="#888888"> </font></span></blockquote>
<span><font color="#888888"> <br>
</font></span></div>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> </font></span></blockquote>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> <br>
</font></span></blockquote>
<span><font color="#888888"> <br>
</font></span></div>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888">
</font></span></blockquote>
<span><font color="#888888">
</font></span></div>
<span><font color="#888888"> <br>
</font></span></blockquote>
<span><font color="#888888"> <br>
</font></span></div>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> </font></span></blockquote>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> <br>
<br clear="all">
<div><br>
</div>
-- <br>
What most experimenters take for granted before they
begin their experiments is infinitely more interesting
than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</font></span></blockquote>
<br>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to
which their experiments lead.<br>
-- Norbert Wiener<br>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br></div>