<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 3/11/2015 9:01 PM, Matthew Knepley
wrote:<br>
</div>
<blockquote
cite="mid:CAMYG4GnKGhFuSokeczdFFaBhMWhhgkES03OAUgFjO2ETJb4LHA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Tue, Nov 3, 2015 at 6:58 AM, TAY
wee-beng <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> <br>
<div>On 3/11/2015 8:52 PM, Matthew Knepley wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Tue, Nov 3, 2015 at
6:49 AM, TAY wee-beng <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi,<br>
<br>
I tried and have attached the log.<br>
<br>
Ya, my Poisson eqn has Neumann boundary
condition. Do I need to specify some null
space stuff? Like KSPSetNullSpace or
MatNullSpaceCreate?</blockquote>
<div><br>
</div>
<div>Yes, you need to attach the constant null
space to the matrix.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
</div>
</div>
</div>
</blockquote>
Ok so can you point me to a suitable example so that I
know which one to use specifically?<br>
</div>
</blockquote>
<div><br>
</div>
<div><a moz-do-not-send="true"
href="https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761">https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761</a><br>
</div>
<div><br>
</div>
<div> Matt</div>
</div>
</div>
</div>
</blockquote>
Hi,<br>
<br>
Actually, I realised that for my Poisson eqn, I have neumann and
dirichlet BC. Dirichlet BC is at the output grids by specifying
pressure = 0. So do I still need the null space?<br>
<br>
My Poisson eqn LHS is fixed but RHS is changing with every timestep.<br>
<br>
If I need to use null space, how do I know if the null space
contains the constant vector and what the the no. of vectors? I
follow the example given and added:<br>
<br>
call
MatNullSpaceCreate(MPI_COMM_WORLD,PETSC_TRUE,0,NULL,nullsp,ierr)<br>
<br>
call MatSetNullSpace(A,nullsp,ierr)<br>
<br>
call MatNullSpaceDestroy(nullsp,ierr)<br>
<br>
Is that all?<br>
<br>
Before this, I was using HYPRE geometric solver and the matrix /
vector in the subroutine was written based on HYPRE. It worked
pretty well and fast.<br>
<br>
However, it's a black box and it's hard to diagnose problems.<br>
<br>
I always had the PETSc subroutine to solve my Poisson eqn but I used
KSPBCGS or KSPGMRES with HYPRE's boomeramg as the PC. It worked but
was slow. <br>
<br>
Matt: Thanks, I will see how it goes using the nullspace and may try
"<i style="color:rgb(0,0,0);white-space:pre-wrap">-mg_coarse_pc_type svd</i>"
later.<br>
<blockquote
cite="mid:CAMYG4GnKGhFuSokeczdFFaBhMWhhgkES03OAUgFjO2ETJb4LHA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Thanks.<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span><br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
</span>
<div>
<div> On 3/11/2015 12:45 PM, Barry Smith
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On Nov 2, 2015, at 10:37 PM, TAY
wee-beng<<a moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank">zonexo@gmail.com</a>>
wrote:<br>
<br>
Hi,<br>
<br>
I tried :<br>
<br>
1. -poisson_pc_gamg_agg_nsmooths 1
-poisson_pc_type gamg<br>
<br>
2. -poisson_pc_type gamg<br>
</blockquote>
Run with
-poisson_ksp_monitor_true_residual
-poisson_ksp_monitor_converged_reason<br>
Does your poisson have Neumann boundary
conditions? Do you have any zeros on the
diagonal for the matrix (you shouldn't).<br>
<br>
There may be something wrong with
your poisson discretization that was
also messing up hypre<br>
<br>
<br>
<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Both options give:<br>
<br>
1 0.00150000 0.00000000
0.00000000 1.00000000
NaN NaN NaN<br>
M Diverged but why?, time =
2<br>
reason = -9<br>
<br>
How can I check what's wrong?<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 3/11/2015 3:18 AM, Barry Smith
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
hypre is just not scaling well
here. I do not know why. Since hypre
is a block box for us there is no
way to determine why the poor
scaling.<br>
<br>
If you make the same two runs
with -pc_type gamg there will be a
lot more information in the log
summary about in what routines it is
scaling well or poorly.<br>
<br>
Barry<br>
<br>
<br>
<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On Nov 2, 2015, at 3:17 AM, TAY
wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
Hi,<br>
<br>
I have attached the 2 files.<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 2/11/2015 2:55 PM, Barry Smith
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Run (158/2)x(266/2)x(150/2)
grid on 8 processes and then
(158)x(266)x(150) on 64
processors and send the two
-log_summary results<br>
<br>
Barry<br>
<br>
<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On Nov 2, 2015, at 12:19 AM,
TAY wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
Hi,<br>
<br>
I have attached the new
results.<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 2/11/2015 12:27 PM, Barry
Smith wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Run without the
-momentum_ksp_view
-poisson_ksp_view and send
the new results<br>
<br>
<br>
You can see from the log
summary that the PCSetUp is
taking a much smaller
percentage of the time
meaning that it is reusing
the preconditioner and not
rebuilding it each time.<br>
<br>
Barry<br>
<br>
Something makes no sense
with the output: it gives<br>
<br>
KSPSolve 199 1.0
2.3298e+03 1.0 5.20e+09 1.8
3.8e+04 9.9e+05 5.0e+02
90100 66100 24 90100 66100
24 165<br>
<br>
90% of the time is in the
solve but there is no
significant amount of time
in other events of the code
which is just not possible.
I hope it is due to your IO.<br>
<br>
<br>
<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On Nov 1, 2015, at 10:02
PM, TAY wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
Hi,<br>
<br>
I have attached the new
run with 100 time steps
for 48 and 96 cores.<br>
<br>
Only the Poisson eqn 's
RHS changes, the LHS
doesn't. So if I want to
reuse the preconditioner,
what must I do? Or what
must I not do?<br>
<br>
Why does the number of
processes increase so
much? Is there something
wrong with my coding?
Seems to be so too for my
new run.<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 2/11/2015 9:49 AM,
Barry Smith wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px
0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
If you are doing many
time steps with the same
linear solver then you
MUST do your weak
scaling studies with
MANY time steps since
the setup time of AMG
only takes place in the
first stimestep. So run
both 48 and 96 processes
with the same large
number of time steps.<br>
<br>
Barry<br>
<br>
<br>
<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px
0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On Nov 1, 2015, at
7:35 PM, TAY
wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
Hi,<br>
<br>
Sorry I forgot and use
the old a.out. I have
attached the new log
for 48cores (log48),
together with the
96cores log (log96).<br>
<br>
Why does the number of
processes increase so
much? Is there
something wrong with
my coding?<br>
<br>
Only the Poisson eqn
's RHS changes, the
LHS doesn't. So if I
want to reuse the
preconditioner, what
must I do? Or what
must I not do?<br>
<br>
Lastly, I only
simulated 2 time steps
previously. Now I run
for 10 timesteps
(log48_10). Is it
building the
preconditioner at
every timestep?<br>
<br>
Also, what about
momentum eqn? Is it
working well?<br>
<br>
I will try the gamg
later too.<br>
<br>
Thank you<br>
<br>
Yours sincerely,<br>
<br>
TAY wee-beng<br>
<br>
On 2/11/2015 12:30 AM,
Barry Smith wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
You used gmres
with 48 processes
but richardson with
96. You need to be
careful and make
sure you don't
change the solvers
when you change the
number of processors
since you can get
very different
inconsistent results<br>
<br>
Anyways all the
time is being spent
in the BoomerAMG
algebraic multigrid
setup and it is is
scaling badly. When
you double the
problem size and
number of processes
it went from
3.2445e+01 to
4.3599e+02 seconds.<br>
<br>
PCSetUp
3 1.0 3.2445e+01
1.0 9.58e+06 2.0
0.0e+00 0.0e+00
4.0e+00 62 8 0 0
4 62 8 0 0 5
11<br>
<br>
PCSetUp
3 1.0 4.3599e+02
1.0 9.58e+06 2.0
0.0e+00 0.0e+00
4.0e+00 85 18 0 0
6 85 18 0 0 6
2<br>
<br>
Now is the
Poisson problem
changing at each
timestep or can you
use the same
preconditioner built
with BoomerAMG for
all the time steps?
Algebraic multigrid
has a large set up
time that you often
doesn't matter if
you have many time
steps but if you
have to rebuild it
each timestep it is
too large?<br>
<br>
You might also
try -pc_type gamg
and see how PETSc's
algebraic multigrid
scales for your
problem/machine.<br>
<br>
Barry<br>
<br>
<br>
<br>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On Nov 1, 2015, at
7:30 AM, TAY
wee-beng<<a
moz-do-not-send="true"
href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
<br>
On 1/11/2015 10:00
AM, Barry Smith
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On Oct 31,
2015, at 8:43
PM, TAY
wee-beng<<a
moz-do-not-send="true" href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
<br>
<br>
On 1/11/2015
12:47 AM,
Matthew
Knepley wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On Sat, Oct
31, 2015 at
11:34 AM, TAY
wee-beng<<a
moz-do-not-send="true" href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
wrote:<br>
Hi,<br>
<br>
I understand
that as
mentioned in
the faq, due
to the
limitations in
memory, the
scaling is not
linear. So, I
am trying to
write a
proposal to
use a
supercomputer.<br>
Its specs are:<br>
Compute nodes:
82,944 nodes
(SPARC64
VIIIfx; 16GB
of memory per
node)<br>
<br>
8 cores /
processor<br>
Interconnect:
Tofu
(6-dimensional
mesh/torus)
Interconnect<br>
Each cabinet
contains 96
computing
nodes,<br>
One of the
requirement is
to give the
performance of
my current
code with my
current set of
data, and
there is a
formula to
calculate the
estimated
parallel
efficiency
when using the
new large set
of data<br>
There are 2
ways to give
performance:<br>
1. Strong
scaling, which
is defined as
how the
elapsed time
varies with
the number of
processors for
a fixed<br>
problem.<br>
2. Weak
scaling, which
is defined as
how the
elapsed time
varies with
the number of
processors for
a<br>
fixed problem
size per
processor.<br>
I ran my cases
with 48 and 96
cores with my
current
cluster,
giving 140 and
90 mins
respectively.
This is
classified as
strong
scaling.<br>
Cluster specs:<br>
CPU: AMD 6234
2.4GHz<br>
8 cores /
processor
(CPU)<br>
6 CPU / node<br>
So 48 Cores /
CPU<br>
Not sure abt
the memory /
node<br>
<br>
The parallel
efficiency
‘En’ for a
given degree
of parallelism
‘n’ indicates
how much the
program is<br>
efficiently
accelerated by
parallel
processing.
‘En’ is given
by the
following
formulae.
Although their<br>
derivation
processes are
different
depending on
strong and
weak scaling,
derived
formulae are
the<br>
same.<br>
From the
estimated
time, my
parallel
efficiency
using
Amdahl's law
on the current
old cluster
was 52.7%.<br>
So is my
results
acceptable?<br>
For the large
data set, if
using 2205
nodes
(2205X8cores),
my expected
parallel
efficiency is
only 0.5%. The
proposal
recommends
value of >
50%.<br>
The problem
with this
analysis is
that the
estimated
serial
fraction from
Amdahl's Law
changes as a
function<br>
of problem
size, so you
cannot take
the strong
scaling from
one problem
and apply it
to another
without a<br>
model of this
dependence.<br>
<br>
Weak scaling
does model
changes with
problem size,
so I would
measure weak
scaling on
your current<br>
cluster, and
extrapolate to
the big
machine. I
realize that
this does not
make sense for
many
scientific<br>
applications,
but neither
does requiring
a certain
parallel
efficiency.<br>
</blockquote>
Ok I check the
results for my
weak scaling
it is even
worse for the
expected
parallel
efficiency.
From the
formula used,
it's obvious
it's doing
some sort of
exponential
extrapolation
decrease. So
unless I can
achieve a near
> 90% speed
up when I
double the
cores and
problem size
for my current
48/96 cores
setup,
extrapolating
from about 96
nodes to
10,000 nodes
will give a
much lower
expected
parallel
efficiency for
the new case.<br>
<br>
However, it's
mentioned in
the FAQ that
due to memory
requirement,
it's
impossible to
get >90%
speed when I
double the
cores and
problem size
(ie linear
increase in
performance),
which means
that I can't
get >90%
speed up when
I double the
cores and
problem size
for my current
48/96 cores
setup. Is that
so?<br>
</blockquote>
What is the
output of
-ksp_view
-log_summary on
the problem and
then on the
problem doubled
in size and
number of
processors?<br>
<br>
Barry<br>
</blockquote>
Hi,<br>
<br>
I have attached
the output<br>
<br>
48 cores: log48<br>
96 cores: log96<br>
<br>
There are 2
solvers - The
momentum linear
eqn uses bcgs,
while the Poisson
eqn uses hypre
BoomerAMG.<br>
<br>
Problem size
doubled from
158x266x150 to
158x266x300.<br>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
So is it fair
to say that
the main
problem does
not lie in my
programming
skills, but
rather the way
the linear
equations are
solved?<br>
<br>
Thanks.<br>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Thanks,<br>
<br>
Matt<br>
Is it possible
for this type
of scaling in
PETSc
(>50%),
when using
17640 (2205X8)
cores?<br>
Btw, I do not
have access to
the system.<br>
<br>
<br>
<br>
Sent using
CloudMagic
Email<br>
<br>
<br>
<br>
-- <br>
What most
experimenters
take for
granted before
they begin
their
experiments is
infinitely
more
interesting
than any
results to
which their
experiments
lead.<br>
-- Norbert
Wiener<br>
</blockquote>
</blockquote>
</blockquote>
<log48.txt><log96.txt><br>
</blockquote>
</blockquote>
<log48_10.txt><log48.txt><log96.txt><br>
</blockquote>
</blockquote>
<log96_100.txt><log48_100.txt><br>
</blockquote>
</blockquote>
<log96_100_2.txt><log48_100_2.txt><br>
</blockquote>
</blockquote>
<log64_100.txt><log8_100.txt><br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<span class=""><font color="#888888">
<div><br>
</div>
-- <br>
<div>What most experimenters take for granted
before they begin their experiments is
infinitely more interesting than any results
to which their experiments lead.<br>
-- Norbert Wiener</div>
</font></span></div>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="gmail_signature">What most experimenters take for
granted before they begin their experiments is infinitely
more interesting than any results to which their experiments
lead.<br>
-- Norbert Wiener</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>