<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 3/11/2015 8:52 PM, Matthew Knepley
wrote:<br>
</div>
<blockquote
cite="mid:CAMYG4GkiiAtJaJUm-uotK1BsaTsO1V=_eQfGPCqHLybbEUcLTw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Tue, Nov 3, 2015 at 6:49 AM, TAY
wee-beng <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
I tried and have attached the log.<br>
<br>
Ya, my Poisson eqn has Neumann boundary condition. Do I
need to specify some null space stuff? Like
KSPSetNullSpace or MatNullSpaceCreate?</blockquote>
<div><br>
</div>
<div>Yes, you need to attach the constant null space to the
matrix.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
</div>
</div>
</div>
</blockquote>
Ok so can you point me to a suitable example so that I know which
one to use specifically?<br>
<br>
Thanks.<br>
<blockquote
cite="mid:CAMYG4GkiiAtJaJUm-uotK1BsaTsO1V=_eQfGPCqHLybbEUcLTw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="im HOEnZb"><br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
</span>
<div class="HOEnZb">
<div class="h5">
On 3/11/2015 12:45 PM, Barry Smith wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
On Nov 2, 2015, at 10:37 PM, TAY wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
Hi,<br>
<br>
I tried :<br>
<br>
1. -poisson_pc_gamg_agg_nsmooths 1
-poisson_pc_type gamg<br>
<br>
2. -poisson_pc_type gamg<br>
</blockquote>
Run with -poisson_ksp_monitor_true_residual
-poisson_ksp_monitor_converged_reason<br>
Does your poisson have Neumann boundary conditions?
Do you have any zeros on the diagonal for the matrix
(you shouldn't).<br>
<br>
There may be something wrong with your poisson
discretization that was also messing up hypre<br>
<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
Both options give:<br>
<br>
1 0.00150000 0.00000000
0.00000000 1.00000000 NaN
NaN NaN<br>
M Diverged but why?, time = 2<br>
reason = -9<br>
<br>
How can I check what's wrong?<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 3/11/2015 3:18 AM, Barry Smith wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
hypre is just not scaling well here. I do
not know why. Since hypre is a block box for us
there is no way to determine why the poor
scaling.<br>
<br>
If you make the same two runs with -pc_type
gamg there will be a lot more information in the
log summary about in what routines it is scaling
well or poorly.<br>
<br>
Barry<br>
<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
On Nov 2, 2015, at 3:17 AM, TAY wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
Hi,<br>
<br>
I have attached the 2 files.<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 2/11/2015 2:55 PM, Barry Smith wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Run (158/2)x(266/2)x(150/2) grid on 8
processes and then (158)x(266)x(150) on 64
processors and send the two -log_summary
results<br>
<br>
Barry<br>
<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
On Nov 2, 2015, at 12:19 AM, TAY
wee-beng<<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>>
wrote:<br>
<br>
Hi,<br>
<br>
I have attached the new results.<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 2/11/2015 12:27 PM, Barry Smith wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Run without the -momentum_ksp_view
-poisson_ksp_view and send the new
results<br>
<br>
<br>
You can see from the log summary that
the PCSetUp is taking a much smaller
percentage of the time meaning that it
is reusing the preconditioner and not
rebuilding it each time.<br>
<br>
Barry<br>
<br>
Something makes no sense with the
output: it gives<br>
<br>
KSPSolve 199 1.0 2.3298e+03
1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02
90100 66100 24 90100 66100 24 165<br>
<br>
90% of the time is in the solve but
there is no significant amount of time
in other events of the code which is
just not possible. I hope it is due to
your IO.<br>
<br>
<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
On Nov 1, 2015, at 10:02 PM, TAY
wee-beng<<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>>
wrote:<br>
<br>
Hi,<br>
<br>
I have attached the new run with 100
time steps for 48 and 96 cores.<br>
<br>
Only the Poisson eqn 's RHS changes,
the LHS doesn't. So if I want to reuse
the preconditioner, what must I do? Or
what must I not do?<br>
<br>
Why does the number of processes
increase so much? Is there something
wrong with my coding? Seems to be so
too for my new run.<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 2/11/2015 9:49 AM, Barry Smith
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
If you are doing many time steps
with the same linear solver then you
MUST do your weak scaling studies
with MANY time steps since the setup
time of AMG only takes place in the
first stimestep. So run both 48 and
96 processes with the same large
number of time steps.<br>
<br>
Barry<br>
<br>
<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
On Nov 1, 2015, at 7:35 PM, TAY
wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
Hi,<br>
<br>
Sorry I forgot and use the old
a.out. I have attached the new log
for 48cores (log48), together with
the 96cores log (log96).<br>
<br>
Why does the number of processes
increase so much? Is there
something wrong with my coding?<br>
<br>
Only the Poisson eqn 's RHS
changes, the LHS doesn't. So if I
want to reuse the preconditioner,
what must I do? Or what must I not
do?<br>
<br>
Lastly, I only simulated 2 time
steps previously. Now I run for 10
timesteps (log48_10). Is it
building the preconditioner at
every timestep?<br>
<br>
Also, what about momentum eqn? Is
it working well?<br>
<br>
I will try the gamg later too.<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 2/11/2015 12:30 AM, Barry Smith
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
You used gmres with 48
processes but richardson with
96. You need to be careful and
make sure you don't change the
solvers when you change the
number of processors since you
can get very different
inconsistent results<br>
<br>
Anyways all the time is
being spent in the BoomerAMG
algebraic multigrid setup and it
is is scaling badly. When you
double the problem size and
number of processes it went from
3.2445e+01 to 4.3599e+02
seconds.<br>
<br>
PCSetUp 3 1.0
3.2445e+01 1.0 9.58e+06 2.0
0.0e+00 0.0e+00 4.0e+00 62 8
0 0 4 62 8 0 0 5 11<br>
<br>
PCSetUp 3 1.0
4.3599e+02 1.0 9.58e+06 2.0
0.0e+00 0.0e+00 4.0e+00 85 18
0 0 6 85 18 0 0 6 2<br>
<br>
Now is the Poisson problem
changing at each timestep or can
you use the same preconditioner
built with BoomerAMG for all the
time steps? Algebraic multigrid
has a large set up time that you
often doesn't matter if you have
many time steps but if you have
to rebuild it each timestep it
is too large?<br>
<br>
You might also try -pc_type
gamg and see how PETSc's
algebraic multigrid scales for
your problem/machine.<br>
<br>
Barry<br>
<br>
<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
On Nov 1, 2015, at 7:30 AM,
TAY wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
<br>
On 1/11/2015 10:00 AM, Barry
Smith wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
On Oct 31, 2015, at 8:43
PM, TAY wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
<br>
On 1/11/2015 12:47 AM,
Matthew Knepley wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
On Sat, Oct 31, 2015 at
11:34 AM, TAY
wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
Hi,<br>
<br>
I understand that as
mentioned in the faq,
due to the limitations
in memory, the scaling
is not linear. So, I am
trying to write a
proposal to use a
supercomputer.<br>
Its specs are:<br>
Compute nodes: 82,944
nodes (SPARC64 VIIIfx;
16GB of memory per node)<br>
<br>
8 cores / processor<br>
Interconnect: Tofu
(6-dimensional
mesh/torus) Interconnect<br>
Each cabinet contains 96
computing nodes,<br>
One of the requirement
is to give the
performance of my
current code with my
current set of data, and
there is a formula to
calculate the estimated
parallel efficiency when
using the new large set
of data<br>
There are 2 ways to give
performance:<br>
1. Strong scaling, which
is defined as how the
elapsed time varies with
the number of processors
for a fixed<br>
problem.<br>
2. Weak scaling, which
is defined as how the
elapsed time varies with
the number of processors
for a<br>
fixed problem size per
processor.<br>
I ran my cases with 48
and 96 cores with my
current cluster, giving
140 and 90 mins
respectively. This is
classified as strong
scaling.<br>
Cluster specs:<br>
CPU: AMD 6234 2.4GHz<br>
8 cores / processor
(CPU)<br>
6 CPU / node<br>
So 48 Cores / CPU<br>
Not sure abt the memory
/ node<br>
<br>
The parallel efficiency
‘En’ for a given degree
of parallelism ‘n’
indicates how much the
program is<br>
efficiently accelerated
by parallel processing.
‘En’ is given by the
following formulae.
Although their<br>
derivation processes are
different depending on
strong and weak scaling,
derived formulae are the<br>
same.<br>
From the estimated
time, my parallel
efficiency using
Amdahl's law on the
current old cluster was
52.7%.<br>
So is my results
acceptable?<br>
For the large data set,
if using 2205 nodes
(2205X8cores), my
expected parallel
efficiency is only 0.5%.
The proposal recommends
value of > 50%.<br>
The problem with this
analysis is that the
estimated serial
fraction from Amdahl's
Law changes as a
function<br>
of problem size, so you
cannot take the strong
scaling from one problem
and apply it to another
without a<br>
model of this
dependence.<br>
<br>
Weak scaling does model
changes with problem
size, so I would measure
weak scaling on your
current<br>
cluster, and extrapolate
to the big machine. I
realize that this does
not make sense for many
scientific<br>
applications, but
neither does requiring a
certain parallel
efficiency.<br>
</blockquote>
Ok I check the results for
my weak scaling it is even
worse for the expected
parallel efficiency. From
the formula used, it's
obvious it's doing some
sort of exponential
extrapolation decrease. So
unless I can achieve a
near > 90% speed up
when I double the cores
and problem size for my
current 48/96 cores
setup, extrapolating
from about 96 nodes to
10,000 nodes will give a
much lower expected
parallel efficiency for
the new case.<br>
<br>
However, it's mentioned in
the FAQ that due to memory
requirement, it's
impossible to get >90%
speed when I double the
cores and problem size (ie
linear increase in
performance), which means
that I can't get >90%
speed up when I double the
cores and problem size for
my current 48/96 cores
setup. Is that so?<br>
</blockquote>
What is the output of
-ksp_view -log_summary on
the problem and then on the
problem doubled in size and
number of processors?<br>
<br>
Barry<br>
</blockquote>
Hi,<br>
<br>
I have attached the output<br>
<br>
48 cores: log48<br>
96 cores: log96<br>
<br>
There are 2 solvers - The
momentum linear eqn uses bcgs,
while the Poisson eqn uses
hypre BoomerAMG.<br>
<br>
Problem size doubled from
158x266x150 to 158x266x300.<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
So is it fair to say that
the main problem does not
lie in my programming
skills, but rather the way
the linear equations are
solved?<br>
<br>
Thanks.<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
Thanks,<br>
<br>
Matt<br>
Is it possible for this
type of scaling in PETSc
(>50%), when using
17640 (2205X8) cores?<br>
Btw, I do not have
access to the system.<br>
<br>
<br>
<br>
Sent using CloudMagic
Email<br>
<br>
<br>
<br>
-- <br>
What most experimenters
take for granted before
they begin their
experiments is
infinitely more
interesting than any
results to which their
experiments lead.<br>
-- Norbert Wiener<br>
</blockquote>
</blockquote>
</blockquote>
<log48.txt><log96.txt><br>
</blockquote>
</blockquote>
<log48_10.txt><log48.txt><log96.txt><br>
</blockquote>
</blockquote>
<log96_100.txt><log48_100.txt><br>
</blockquote>
</blockquote>
<log96_100_2.txt><log48_100_2.txt><br>
</blockquote>
</blockquote>
<log64_100.txt><log8_100.txt><br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="gmail_signature">What most experimenters take for
granted before they begin their experiments is infinitely
more interesting than any results to which their experiments
lead.<br>
-- Norbert Wiener</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>