<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Jed,<br>
<br>
I believe they are real cores. Anyway, I have attached the log
summary for the 12/24/48 cores. I re-run a smaller case because
the large problem can't run with 12cores.<br>
<pre class="moz-signature" cols="72">Yours sincerely,
TAY wee-beng</pre>
On 3/10/2012 5:59 PM, Jed Brown wrote:<br>
</div>
<blockquote
cite="mid:CAM9tzSm8sUXb3FtXwXT4VX9tputhAyL--iZ3n0VnOkOdsxYJfg@mail.gmail.com"
type="cite">There is an inordinate amount of time being spent in
VecScatterEnd(). That sometimes indicates a very bad partition.
Also, are your "48 cores" real physical cores or just "logical
cores" (look like cores to the operating system, usually
advertised as "threads" by the vendor, nothing like cores in
reality)? That can cause a huge load imbalance and very confusing
results as over-subscribed threads compete for shared resources.
Step it back to 24 threads and 12 threads, send log_summary for
each.<br>
<br>
<div class="gmail_quote">On Wed, Oct 3, 2012 at 8:08 AM, TAY
wee-beng <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im">
<div>On 2/10/2012 2:43 PM, Jed Brown wrote:<br>
</div>
<blockquote type="cite">On Tue, Oct 2, 2012 at 8:35 AM,
TAY wee-beng <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi,<br>
<br>
I have combined the momentum linear eqns
involving x,y,z into 1 large matrix. The Poisson
eqn is solved using HYPRE strcut format so it's
not included. I run the code for 50 timesteps
(hence 50 kspsolve) using 96 procs. The
log_summary is given below. I have some
questions:<br>
<br>
1. After combining the matrix, I should have
only 1 PETSc matrix. Why does it says there are
4 matrix, 12 vector etc? <br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>They are part of preconditioning. Are you sure
you're using Hypre for this? It looks like you are
using bjacobi/ilu.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div> <br>
2. I'm looking at the stages which take the
longest time. It seems that MatAssemblyBegin,
VecNorm, VecAssemblyBegin, VecScatterEnd have
very high ratios. The ratios of some others are
also not too good (~ 1.6 - 2). So are these
stages the reason why my code is not scaling
well? What can I do to improve it?<br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>3/4 of the solve time is evenly balanced between
MatMult, MatSolve, MatLUFactorNumeric, and
VecNorm+VecDot.</div>
<div><br>
</div>
<div>The high VecAssembly time might be due to
generating a lot of entries off-process?</div>
<div><br>
</div>
<div>In any case, this looks like an _extremely_ slow
network, perhaps it's misconfigured?</div>
</div>
</blockquote>
<br>
</div>
My cluster is configured with 48 procs per node. I re-run
the case, using only 48 procs, thus there's no need to pass
over a 'slow' interconnect. I'm now also using GAMG and BCGS
for the poisson and momentum eqn respectively. I have also
separated the x,y,z component of the momentum eqn to 3
separate linear eqns to debug the problem. <br>
<br>
Results show that stage "momentum_z" is taking a lot of
time. I wonder if it has to do with the fact that I am
partitioning my grids in the z direction. VecScatterEnd,
MatMult are taking a lot of time. VecNormalize,
VecScatterEnd, VecNorm, VecAssemblyBegin 's ratio are also
not good.<br>
<br>
I wonder why a lot of entries are generated off-process.<br>
<br>
I create my RHS vector using:<br>
<br>
<i>call
VecCreateMPI(MPI_COMM_WORLD,ijk_xyz_end-ijk_xyz_sta,PETSC_DECIDE,b_rhs_semi_z,ierr)</i><br>
<br>
where ijk_xyz_sta and ijk_xyz_end are obtained from<br>
<br>
<i>call
MatGetOwnershipRange(A_semi_z,ijk_xyz_sta,ijk_xyz_end,ierr)</i><br>
<br>
I then insert the values into the vector using:<br>
<br>
<i>call VecSetValues(b_rhs_semi_z , ijk_xyz_end -
ijk_xyz_sta , (/ijk_xyz_sta : ijk_xyz_end - 1/) ,
q_semi_vect_z(ijk_xyz_sta + 1 : ijk_xyz_end) ,
INSERT_VALUES , ierr)</i><br>
<br>
What should I do to correct the problem?<br>
<br>
Thanks
<div>
<div class="h5"><br>
<br>
<blockquote type="cite">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div> <br>
Btw, I insert matrix using:<br>
<br>
<i>do ijk=ijk_xyz_sta+1,ijk_xyz_end</i><i><br>
</i><i><br>
</i><i> II = ijk - 1</i><i> !Fortran
shift to 0-based</i><i><br>
</i><i> </i><i><br>
</i><i> call
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)</i><i><br>
</i><i><br>
</i><i>end do</i><br>
<br>
where ijk_xyz_sta/ijk_xyz_end are the
starting/end index<br>
<br>
int_semi_xyz(ijk,1:7) stores the 7 column
global indices<br>
<br>
semi_mat_xyz has the corresponding values.<br>
<br>
and I insert vectors using:<br>
<br>
call
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)<br>
<br>
Thanks!<br>
<br>
<i><br>
</i><br>
<pre cols="72">Yours sincerely,
TAY wee-beng</pre>
<div>
<div> On 30/9/2012 11:30 PM, Jed Brown
wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote type="cite">
<p>You can measure the time spent in Hypre
via PCApply and PCSetUp, but you can't
get finer grained integrated profiling
because it was not set up that way.</p>
<div class="gmail_quote">On Sep 30, 2012
3:26 PM, "TAY wee-beng" <<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>On 27/9/2012 1:44 PM, Matthew
Knepley wrote:<br>
</div>
<blockquote type="cite">On Thu, Sep
27, 2012 at 3:49 AM, TAY wee-beng
<span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> Hi,<br>
<br>
I'm doing a log summary for my
3d cfd code. I have some
questions:<br>
<br>
1. if I'm solving 3 linear
equations using ksp, is the
result given in the log
summary the total of the 3
linear eqns' performance? How
can I get the performance for
each individual eqn?<br>
</blockquote>
<div><br>
</div>
<div>Use logging stages: <a
moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html"
target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html</a></div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> 2. If
I run my code for 10 time
steps, does the log summary
gives the total or avg
performance/ratio?<br>
</blockquote>
<div><br>
</div>
<div>Total.</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> 3.
Besides PETSc, I'm also using
HYPRE's native geometric MG
(Struct) to solve my
Cartesian's grid CFD poisson
eqn. Is there any way I can
use PETSc's log summary to get
HYPRE's performance? If I use
boomerAMG thru PETSc, can I
get its performance?</blockquote>
<div><br>
</div>
<div>If you mean flops, only if
you count them yourself and
tell PETSc using <a
moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html"
target="_blank">http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html</a></div>
<div><br>
</div>
<div>This is the disadvantage of
using packages that do not
properly monitor things :)</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
</div>
</blockquote>
So u mean if I use boomerAMG thru
PETSc, there is no proper way of
evaluating its performance, beside
using PetscLogFlops?<br>
<blockquote type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> <span><font
color="#888888"><br>
-- <br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
</font></span></blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
What most experimenters take for
granted before they begin their
experiments is infinitely more
interesting than any results to
which their experiments lead.<br>
-- Norbert Wiener<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</body>
</html>