<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 10/04/2016 01:20 PM, Matthew Knepley
wrote:<br>
</div>
<blockquote
cite="mid:CAMYG4GkBQtSpUtgdGixwAD86JgZUY0ZCWy=uS6aD9i7Dnqc6Fg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Tue, Oct 4, 2016 at 3:09 PM, frank
<span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hengjiew@uci.edu" target="_blank">hengjiew@uci.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_-8767381834923078048moz-cite-prefix">Hi
Dave,<br>
<br>
Thank you for the reply.<br>
What do you mean by the "nested calls to KSPSolve"?<br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>KSPSolve is called again after redistributing the
computation.</div>
</div>
</div>
</div>
</blockquote>
<br>
I am still confused. There is only one KSPSolve in my code. <br>
Do you mean KSPSolve is called again in the sub-communicator? If
that's the case, even if I put two identical KSPSolve in the code,
the sub-communicator is still going to call KSPSolve, right?<br>
<br>
<blockquote
cite="mid:CAMYG4GkBQtSpUtgdGixwAD86JgZUY0ZCWy=uS6aD9i7Dnqc6Fg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_-8767381834923078048moz-cite-prefix"> I
tried to call KSPSolve twice, but the the second solve
converged in 0 iteration. KSPSolve seems to remember
the solution. How can I force both solves start from
the same initial guess?<br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Did you zero the solution vector between solves?
VecSet(x, 0.0);</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_-8767381834923078048moz-cite-prefix">
Thank you.<span class="HOEnZb"><font color="#888888"><br>
<br>
Frank</font></span>
<div>
<div class="h5"><br>
<br>
<br>
On 10/04/2016 12:56 PM, Dave May wrote:<br>
</div>
</div>
</div>
<div>
<div class="h5">
<blockquote type="cite"><br>
<br>
On Tuesday, 4 October 2016, frank <<a
moz-do-not-send="true"
href="mailto:hengjiew@uci.edu" target="_blank">hengjiew@uci.edu</a>>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>Hi,</p>
This question is follow-up of the thread
"Question about memory usage in Multigrid
preconditioner".<br>
I used to have the "Out of Memory(OOM)"
problem when using the CG+Telescope MG solver
with 32768 cores. Adding the "-matrap 0;
-matptap_scalable" option did solve that
problem. <br>
<br>
Then I test the scalability by solving a 3d
poisson eqn for 1 step. I used one
sub-communicator in all the tests. The
difference between the petsc options in those
tests are: 1 the
pc_telescope_reduction_factor; 2 the number of
multigrid levels in the up/down solver. The
function "ksp_solve" is timed. It is kind of
slow and doesn't scale at all. <br>
<br>
Test1: 512^3 grid points<br>
Core# telescope_reduction_factor <wbr>
MG levels# for up/down solver Time for
KSPSolve (s)<br>
512 8 <wbr>
4 / 3 <wbr>
6.2466<br>
4096 64 <wbr>
5 / 3 <wbr>
0.9361<br>
32768 64 <wbr>
4 / 3 <wbr>
4.8914<br>
<br>
Test2: 1024^3 grid points<br>
Core# telescope_reduction_factor <wbr>
MG levels# for up/down solver Time for
KSPSolve (s)<br>
4096 64 <wbr>
5 / 4 <wbr>
3.4139<br>
8192 128 <wbr>
5 / 4 <wbr>
2.4196<br>
16384 32
<wbr> 5 /
3 <wbr>
5.4150<br>
32768 64 <wbr>
5 / 3 <wbr>
5.6067<br>
65536 128 <wbr>
5 / 3 <wbr>
6.5219</div>
</blockquote>
<div><br>
</div>
<div>You have to be very careful how you interpret
these numbers. Your solver contains nested calls
to KSPSolve, and unfortunately as a result the
numbers you report include setup time. This will
remain true even if you call KSPSetUp on the
outermost KSP. </div>
<div><br>
</div>
<div>Your email concerns scalability of the silver
application, so let's focus on that issue.</div>
<div><br>
</div>
<div>The only way to clearly separate setup from
solve time is to perform two identical solves.
The second solve will not require any setup. You
should monitor the second solve via a new
PetscStage.</div>
<div><br>
</div>
<div>This was what I did in the telescope paper.
It was the only way to understand the setup cost
(and scaling) cf the solve time (and scaling).</div>
<div><br>
</div>
<div>Thanks</div>
<div> Dave</div>
<div>
<div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> I
guess I didn't set the MG levels properly.
What would be the efficient way to arrange
the MG levels?<br>
Also which preconditionr at the coarse
mesh of the 2nd communicator should I use
to improve the performance? <br>
<br>
I attached the test code and the petsc
options file for the 1024^3 cube with
32768 cores. <br>
<br>
Thank you.<br>
<br>
Regards,<br>
Frank<br>
<br>
<br>
<br>
<br>
<br>
<br>
<div>On 09/15/2016 03:35 AM, Dave May
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>HI all,<br>
<br>
</div>
<div>I the only unexpected
memory usage I can see is
associated with the call to
MatPtAP().<br>
</div>
<div>Here is something you can
try immediately.<br>
</div>
</div>
Run your code with the
additional options<br>
-matrap 0 -matptap_scalable<br>
<br>
</div>
<div>I didn't realize this before,
but the default behaviour of
MatPtAP in parallel is actually
to to explicitly form the
transpose of P (e.g. assemble R
= P^T) and then compute R.A.P. <br>
You don't want to do this. The
option -matrap 0 resolves this
issue.<br>
</div>
<div><br>
</div>
<div>The implementation of P^T.A.P
has two variants. <br>
The scalable implementation
(with respect to memory usage)
is selected via the second
option -matptap_scalable.</div>
<div><br>
</div>
<div>Try it out - I see a
significant memory reduction
using these options for
particular mesh sizes /
partitions.<br>
</div>
<div><br>
</div>
I've attached a cleaned up version
of the code you sent me.<br>
</div>
There were a number of memory leaks
and other issues.<br>
</div>
<div>The main points being<br>
</div>
* You should call
DMDAVecGetArrayF90() before
VecAssembly{Begin,End}<br>
* You should call PetscFinalize(),
otherwise the option -log_summary
(-log_view) will not display anything
once the program has completed.<br>
<div>
<div>
<div><br>
<br>
</div>
<div>Thanks,<br>
</div>
<div> Dave<br>
</div>
<div>
<div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 15
September 2016 at 08:03, Hengjie
Wang <span dir="ltr"><<a
moz-do-not-send="true">hengjiew@uci.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF"
text="#000000"> Hi Dave,<br>
<br>
Sorry, I should have put more
comment to explain the code. <br>
The number of process in each
dimension is the same: Px =
Py=Pz=P. So is the domain size.<br>
So if the you want to run the
code for a 512^3 grid points on
16^3 cores, you need to set "-N
512 -P 16" in the command line.<br>
I add more comments and also fix
an error in the attached code. (
The error only effects the
accuracy of solution but not the
memory usage. ) <br>
<div><br>
Thank you.<span><font
color="#888888"><br>
Frank</font></span>
<div>
<div><br>
<br>
On 9/14/2016 9:05 PM, Dave
May wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote type="cite"><br>
<br>
On Thursday, 15 September
2016, Dave May <<a
moz-do-not-send="true">dave.mayhem23@gmail.com</a>>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex"><br>
<br>
On Thursday, 15
September 2016, frank
<<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF"
text="#000000"> Hi,
<br>
<br>
I write a simple
code to re-produce
the error. I hope
this can help to
diagnose the
problem.<br>
The code just solves
a 3d poisson
equation. </div>
</blockquote>
<div><br>
</div>
<div>Why is the stencil
width a runtime
parameter?? And why is
the default value 2?
For 7-pnt FD Laplace,
you only need
a stencil width of 1. </div>
<div><br>
</div>
<div>Was this choice
made to mimic
something in the
real application code?</div>
</blockquote>
<div><br>
</div>
Please ignore - I
misunderstood your usage
of the param set by -P
<div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
bgcolor="#FFFFFF"
text="#000000"><br>
I run the code on
a 1024^3 mesh. The
process partition
is 32 * 32 * 32.
That's when I
re-produce the OOM
error. Each core
has about 2G
memory.<br>
I also run the
code on a 512^3
mesh with 16 * 16
* 16 processes.
The ksp solver
works fine. <br>
I attached the
code,
ksp_view_pre's
output and my
petsc option file.<br>
<br>
Thank you.<br>
Frank<br>
<div><br>
On 09/09/2016
06:38 PM,
Hengjie Wang
wrote:<br>
</div>
<blockquote
type="cite">Hi
Barry,
<div><br>
</div>
<div>I checked.
On the
supercomputer,
I had the
option
"-ksp_view_pre"
but it is not
in file I sent
you. I am
sorry for the
confusion.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Frank<span></span><br>
<br>
On Friday,
September 9,
2016, Barry
Smith <<a
moz-do-not-send="true">bsmith@mcs.anl.gov</a>>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
> On Sep 9,
2016, at 3:11
PM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>> wrote:<br>
><br>
> Hi Barry,<br>
><br>
> I think
the first KSP
view output is
from
-ksp_view_pre.
Before I
submitted the
test, I was
not sure
whether there
would be OOM
error or not.
So I added
both
-ksp_view_pre
and -ksp_view.<br>
<br>
But the
options file
you sent
specifically
does NOT list
the
-ksp_view_pre
so how could
it be from
that?<br>
<br>
Sorry to be
pedantic but
I've spent too
much time in
the past
trying to
debug from
incorrect
information
and want to
make sure that
the
information I
have is
correct before
thinking.
Please recheck
exactly what
happened.
Rerun with the
exact input
file you
emailed if
that is
needed.<br>
<br>
Barry<br>
<br>
><br>
> Frank<br>
><br>
><br>
> On
09/09/2016
12:38 PM,
Barry Smith
wrote:<br>
>> Why
does
ksp_view2.txt
have two KSP
views in it
while
ksp_view1.txt
has only one
KSPView in it?
Did you run
two different
solves in the
2 case but not
the one?<br>
>><br>
>>
Barry<br>
>><br>
>><br>
>><br>
>>>
On Sep 9,
2016, at 10:56
AM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>> wrote:<br>
>>><br>
>>>
Hi,<br>
>>><br>
>>> I
want to
continue
digging into
the memory
problem here.<br>
>>> I
did find a
work around in
the past,
which is to
use less cores
per node so
that each core
has 8G memory.
However this
is deficient
and expensive.
I hope to
locate the
place that
uses the most
memory.<br>
>>><br>
>>>
Here is a
brief summary
of the tests I
did in past:<br>
>>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12<br>
>>>
Maximum (over
computational
time) process
memory:
total
7.0727e+08<br>
>>>
Current
process
memory:
total
7.0727e+08<br>
>>>
Maximum (over
computational
time) space
PetscMalloc()ed:
total
6.3908e+11<br>
>>>
Current space
PetscMalloc()ed: total
1.8275e+09<br>
>>><br>
>>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24<br>
>>>
Maximum (over
computational
time) process
memory:
total
5.9431e+09<br>
>>>
Current
process
memory:
total
5.9431e+09<br>
>>>
Maximum (over
computational
time) space
PetscMalloc()ed:
total
5.3202e+12<br>
>>>
Current space
PetscMalloc()ed: total
5.4844e+09<br>
>>><br>
>>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24<br>
>>>
OOM( Out Of
Memory )
killer of the
supercomputer
terminated the
job during
"KSPSolve".<br>
>>><br>
>>> I
attached the
output of
ksp_view( the
third test's
output is from
ksp_view_pre
), memory_view
and also the
petsc options.<br>
>>><br>
>>>
In all the
tests, each
core can
access about
2G memory. In
test3, there
are 4223139840
non-zeros in
the matrix.
This will
consume about
1.74M, using
double
precision.
Considering
some extra
memory used to
store integer
index, 2G
memory should
still be way
enough.<br>
>>><br>
>>>
Is there a way
to find out
which part of
KSPSolve uses
the most
memory?<br>
>>>
Thank you so
much.<br>
>>><br>
>>>
BTW, there are
4 options
remains unused
and I don't
understand why
they are
omitted:<br>
>>>
-mg_coarse_telescope_mg_coarse<wbr>_ksp_type
value: preonly<br>
>>>
-mg_coarse_telescope_mg_coarse<wbr>_pc_type
value: bjacobi<br>
>>>
-mg_coarse_telescope_mg_levels<wbr>_ksp_max_it
value: 1<br>
>>>
-mg_coarse_telescope_mg_levels<wbr>_ksp_type
value:
richardson<br>
>>><br>
>>><br>
>>>
Regards,<br>
>>>
Frank<br>
>>><br>
>>>
On 07/13/2016
05:47 PM, Dave
May wrote:<br>
>>>><br>
>>>> On 14 July 2016 at 01:07, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>> Hi Dave,<br>
>>>><br>
>>>> Sorry for the late reply.<br>
>>>> Thank you so much for your detailed reply.<br>
>>>><br>
>>>> I have a question about the estimation of the memory
usage. There
are 4223139840
allocated
non-zeros and
18432 MPI
processes.
Double
precision is
used. So the
memory per
process is:<br>
>>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?<br>
>>>> Did I do sth wrong here? Because this seems too small.<br>
>>>><br>
>>>> No - I totally f***ed it up. You are correct. That'll
teach me for
fumbling
around with my
iphone
calculator and
not using my
brain. (Note
that to
convert to MB
just divide by
1e6, not
1024^2 -
although I
apparently
cannot convert
between units
correctly....)<br>
>>>><br>
>>>> From the PETSc objects associated with the solver, It
looks like it
_should_ run
with 2GB per
MPI rank.
Sorry for my
mistake.
Possibilities
are: somewhere
in your usage
of PETSc
you've
introduced a
memory leak;
PETSc is doing
a huge over
allocation
(e.g. as per
our discussion
of MatPtAP);
or in your
application
code there are
other objects
you have
forgotten to
log the memory
for.<br>
>>>><br>
>>>><br>
>>>><br>
>>>> I am running this job on Bluewater<br>
>>>> I am using the 7 points FD stencil in 3D.<br>
>>>><br>
>>>> I thought so on both counts.<br>
>>>><br>
>>>> I apologize that I made a stupid mistake in computing
the memory per
core. My
settings
render each
core can
access only 2G
memory on
average
instead of 8G
which I
mentioned in
previous
email. I
re-run the job
with 8G memory
per core on
average and
there is no
"Out Of
Memory" error.
I would do
more test to
see if there
is still some
memory issue.<br>
>>>><br>
>>>> Ok. I'd still like to know where the memory was being
used since my
estimates were
off.<br>
>>>><br>
>>>><br>
>>>> Thanks,<br>
>>>> Dave<br>
>>>><br>
>>>> Regards,<br>
>>>> Frank<br>
>>>><br>
>>>><br>
>>>><br>
>>>> On 07/11/2016 01:18 PM, Dave May wrote:<br>
>>>>> Hi Frank,<br>
>>>>><br>
>>>>><br>
>>>>> On 11 July 2016 at 19:14, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>> Hi Dave,<br>
>>>>><br>
>>>>> I re-run the test using bjacobi as the
preconditioner
on the coarse
mesh of
telescope. The
Grid is
3072*256*768
and process
mesh is
96*8*24. The
petsc option
file is
attached.<br>
>>>>> I still got the "Out Of Memory" error. The error
occurred
before the
linear solver
finished one
step. So I
don't have the
full info from
ksp_view. The
info from
ksp_view_pre
is attached.<br>
>>>>><br>
>>>>> Okay - that is essentially useless (sorry)<br>
>>>>><br>
>>>>> It seems to me that the error occurred when the
decomposition
was going to
be changed.<br>
>>>>><br>
>>>>> Based on what information?<br>
>>>>> Running with -info would give us more clues, but
will create a
ton of output.<br>
>>>>> Please try running the case which failed with -info<br>
>>>>> I had another test with a grid of 1536*128*384 and
the same
process mesh
as above.
There was no
error. The
ksp_view info
is attached
for
comparison.<br>
>>>>> Thank you.<br>
>>>>><br>
>>>>><br>
>>>>> [3] Here is my crude estimate of your memory usage.<br>
>>>>> I'll target the biggest memory hogs only to get an
order of
magnitude
estimate<br>
>>>>><br>
>>>>> * The Fine grid operator contains 4223139840
non-zeros
--> 1.8 GB
per MPI rank
assuming
double
precision.<br>
>>>>> The indices for the AIJ could amount to another 0.3
GB (assuming
32 bit
integers)<br>
>>>>><br>
>>>>> * You use 5 levels of coarsening, so the other
operators
should
represent
(collectively)<br>
>>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per
MPI rank on
the
communicator
with 18432
ranks.<br>
>>>>> The coarse grid should consume ~ 0.5 MB per MPI
rank on the
communicator
with 18432
ranks.<br>
>>>>><br>
>>>>> * You use a reduction factor of 64, making the new
communicator
with 288 MPI
ranks.<br>
>>>>> PCTelescope will first gather a temporary matrix
associated
with your
coarse level
operator
assuming a
comm size of
288 living on
the comm with
size 18432.<br>
>>>>> This matrix will require approximately 0.5 * 64 =
32 MB per core
on the 288
ranks.<br>
>>>>> This matrix is then used to form a new MPIAIJ
matrix on the
subcomm, thus
require
another 32 MB
per rank.<br>
>>>>> The temporary matrix is now destroyed.<br>
>>>>><br>
>>>>> * Because a DMDA is detected, a permutation matrix
is assembled.<br>
>>>>> This requires 2 doubles per point in the DMDA.<br>
>>>>> Your coarse DMDA contains 92 x 16 x 48 points.<br>
>>>>> Thus the permutation matrix will require < 1 MB
per MPI rank
on the
sub-comm.<br>
>>>>><br>
>>>>> * Lastly, the matrix is permuted. This uses
MatPtAP(), but
the resulting
operator will
have the same
memory
footprint as
the unpermuted
matrix (32
MB). At any
stage in
PCTelescope,
only 2
operators of
size 32 MB are
held in memory
when the DMDA
is provided.<br>
>>>>><br>
>>>>> From my rough estimates, the worst case memory foot
print for any
given core,
given your
options is
approximately<br>
>>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB<br>
>>>>> This is way below 8 GB.<br>
>>>>><br>
>>>>> Note this estimate completely ignores:<br>
>>>>> (1) the memory required for the restriction
operator,<br>
>>>>> (2) the potential growth in the number of non-zeros
per row due to
Galerkin
coarsening (I
wished
-ksp_view_pre
reported the
output from
MatView so we
could see the
number of
non-zeros
required by
the coarse
level
operators)<br>
>>>>> (3) all temporary vectors required by the CG
solver, and
those required
by the
smoothers.<br>
>>>>> (4) internal memory allocated by MatPtAP<br>
>>>>> (5) memory associated with IS's used within
PCTelescope<br>
>>>>><br>
>>>>> So either I am completely off in my estimates, or
you have not
carefully
estimated the
memory usage
of your
application
code.
Hopefully
others might
examine/correct
my rough
estimates<br>
>>>>><br>
>>>>> Since I don't have your code I cannot access the
latter.<br>
>>>>> Since I don't have access to the same machine you
are running
on, I think we
need to take a
step back.<br>
>>>>><br>
>>>>> [1] What machine are you running on? Send me a URL
if its
available<br>
>>>>><br>
>>>>> [2] What discretization are you using? (I am
guessing a
scalar 7 point
FD stencil)<br>
>>>>> If it's a 7 point FD stencil, we should be able to
examine the
memory usage
of your solver
configuration
using a
standard,
light weight
existing PETSc
example, run
on your
machine at the
same scale.<br>
>>>>> This would hopefully enable us to correctly
evaluate the
actual memory
usage required
by the solver
configuration
you are using.<br>
>>>>><br>
>>>>> Thanks,<br>
>>>>> Dave<br>
>>>>><br>
>>>>><br>
>>>>> Frank<br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>>> On 07/08/2016 10:38 PM, Dave May wrote:<br>
>>>>>><br>
>>>>>> On Saturday, 9 July 2016, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>>> Hi Barry and Dave,<br>
>>>>>><br>
>>>>>> Thank both of you for the advice.<br>
>>>>>><br>
>>>>>> @Barry<br>
>>>>>> I made a mistake in the file names in last
email. I
attached the
correct files
this time.<br>
>>>>>> For all the three tests, 'Telescope' is used as
the coarse
preconditioner.<br>
>>>>>><br>
>>>>>> == Test1: Grid: 1536*128*384, Process Mesh:
48*4*12<br>
>>>>>> Part of the memory usage: Vector 125
124
3971904 0.<br>
>>>>>>
Matrix 101
101
9462372 0<br>
>>>>>><br>
>>>>>> == Test2: Grid: 1536*128*384, Process Mesh:
96*8*24<br>
>>>>>> Part of the memory usage: Vector 125
124
681672 0.<br>
>>>>>>
Matrix 101
101
1462180 0.<br>
>>>>>><br>
>>>>>> In theory, the memory usage in Test1 should be
8 times of
Test2. In my
case, it is
about 6 times.<br>
>>>>>><br>
>>>>>> == Test3: Grid: 3072*256*768, Process Mesh:
96*8*24.
Sub-domain per
process:
32*32*32<br>
>>>>>> Here I get the out of memory error.<br>
>>>>>><br>
>>>>>> I tried to use -mg_coarse jacobi. In this way,
I don't need
to set
-mg_coarse_ksp_type
and
-mg_coarse_pc_type
explicitly,
right?<br>
>>>>>> The linear solver didn't work in this case.
Petsc output
some errors.<br>
>>>>>><br>
>>>>>> @Dave<br>
>>>>>> In test3, I use only one instance of
'Telescope'.
On the coarse
mesh of
'Telescope', I
used LU as the
preconditioner
instead of
SVD.<br>
>>>>>> If my set the levels correctly, then on the
last coarse
mesh of MG
where it calls
'Telescope',
the sub-domain
per process is
2*2*2.<br>
>>>>>> On the last coarse mesh of 'Telescope', there
is only one
grid point per
process.<br>
>>>>>> I still got the OOM error. The detailed petsc
option file is
attached.<br>
>>>>>><br>
>>>>>> Do you understand the expected memory usage for
the particular
parallel LU
implementation
you are using?
I don't
(seriously).
Replace LU
with bjacobi
and re-run
this test. My
point about
solver
debugging is
still valid.<br>
>>>>>><br>
>>>>>> And please send the result of KSPView so we can
see what is
actually used
in the
computations<br>
>>>>>><br>
>>>>>> Thanks<br>
>>>>>> Dave<br>
>>>>>><br>
>>>>>><br>
>>>>>> Thank you so much.<br>
>>>>>><br>
>>>>>> Frank<br>
>>>>>><br>
>>>>>><br>
>>>>>><br>
>>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote:<br>
>>>>>> On Jul 6, 2016, at 4:19 PM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>>><br>
>>>>>> Hi Barry,<br>
>>>>>><br>
>>>>>> Thank you for you advice.<br>
>>>>>> I tried three test. In the 1st test, the grid
is
3072*256*768
and the
process mesh
is 96*8*24.<br>
>>>>>> The linear solver is 'cg' the preconditioner is
'mg' and
'telescope' is
used as the
preconditioner
at the coarse
mesh.<br>
>>>>>> The system gives me the "Out of Memory" error
before the
linear system
is completely
solved.<br>
>>>>>> The info from '-ksp_view_pre' is attached. I
seems to me
that the error
occurs when it
reaches the
coarse mesh.<br>
>>>>>><br>
>>>>>> The 2nd test uses a grid of 1536*128*384 and
process mesh
is 96*8*24.
The 3rd
test
uses the same
grid but a
different
process mesh
48*4*12.<br>
>>>>>> Are you sure this is right? The total
matrix and
vector memory
usage goes
from 2nd test<br>
>>>>>> Vector 384 383
8,193,712
0.<br>
>>>>>> Matrix 103 103
11,508,688
0.<br>
>>>>>> to 3rd test<br>
>>>>>> Vector 384 383
1,590,520
0.<br>
>>>>>> Matrix 103 103
3,508,664
0.<br>
>>>>>> that is the memory usage got smaller but if you
have only
1/8th the
processes and
the same grid
it should have
gotten about 8
times bigger.
Did you maybe
cut the grid
by a factor of
8 also? If so
that still
doesn't
explain it
because the
memory usage
changed by a
factor of 5
something for
the vectors
and 3
something for
the matrices.<br>
>>>>>><br>
>>>>>><br>
>>>>>> The linear solver and petsc options in 2nd and
3rd tests are
the same in
1st test. The
linear solver
works fine in
both test.<br>
>>>>>> I attached the memory usage of the 2nd and 3rd
tests. The
memory info is
from the
option
'-log_summary'.
I tried to use
'-momery_info'
as you
suggested, but
in my case
petsc treated
it as an
unused option.
It output
nothing about
the memory. Do
I need to add
sth to my code
so I can use
'-memory_info'?<br>
>>>>>> Sorry, my mistake the option is
-memory_view<br>
>>>>>><br>
>>>>>> Can you run the one case with -memory_view
and -mg_coarse
jacobi
-ksp_max_it 1
(just so it
doesn't
iterate
forever) to
see how much
memory is used
without the
telescope?
Also run case
2 the same
way.<br>
>>>>>><br>
>>>>>> Barry<br>
>>>>>><br>
>>>>>><br>
>>>>>><br>
>>>>>> In both tests the memory usage is not large.<br>
>>>>>><br>
>>>>>> It seems to me that it might be the
'telescope'
preconditioner
that allocated
a lot of
memory and
caused the
error in the
1st test.<br>
>>>>>> Is there is a way to show how much memory it
allocated?<br>
>>>>>><br>
>>>>>> Frank<br>
>>>>>><br>
>>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote:<br>
>>>>>> Frank,<br>
>>>>>><br>
>>>>>> You can run with -ksp_view_pre to have it
"view" the KSP
before the
solve so
hopefully it
gets that far.<br>
>>>>>><br>
>>>>>> Please run the problem that does fit with
-memory_info
when the
problem
completes it
will show the
"high water
mark" for
PETSc
allocated
memory and
total memory
used. We first
want to look
at these
numbers to see
if it is using
more memory
than you
expect. You
could also run
with say half
the grid
spacing to see
how the memory
usage scaled
with the
increase in
grid points.
Make the runs
also with
-log_view and
send all the
output from
these options.<br>
>>>>>><br>
>>>>>> Barry<br>
>>>>>><br>
>>>>>> On Jul 5, 2016, at 5:23 PM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>>><br>
>>>>>> Hi,<br>
>>>>>><br>
>>>>>> I am using the CG ksp solver and Multigrid
preconditioner
to solve a
linear system
in parallel.<br>
>>>>>> I chose to use the 'Telescope' as the
preconditioner
on the coarse
mesh for its
good
performance.<br>
>>>>>> The petsc options file is attached.<br>
>>>>>><br>
>>>>>> The domain is a 3d box.<br>
>>>>>> It works well when the grid is 1536*128*384
and the
process mesh
is 96*8*24.
When I double
the size of
grid and
keep the same
process mesh
and petsc
options, I get
an "out of
memory" error
from the
super-cluster
I am using.<br>
>>>>>> Each process has access to at least 8G memory,
which should
be more than
enough for my
application. I
am sure that
all the other
parts of my
code( except
the linear
solver ) do
not use much
memory. So I
doubt if there
is something
wrong with the
linear solver.<br>
>>>>>> The error occurs before the linear system is
completely
solved so I
don't have the
info from ksp
view. I am not
able to
re-produce the
error with a
smaller
problem
either.<br>
>>>>>> In addition, I tried to use the block jacobi
as the
preconditioner
with the same
grid and same
decomposition.
The linear
solver runs
extremely slow
but there is
no memory
error.<br>
>>>>>><br>
>>>>>> How can I diagnose what exactly cause the
error?<br>
>>>>>> Thank you so much.<br>
>>>>>><br>
>>>>>> Frank<br>
>>>>>> <petsc_options.txt><br>
>>>>>> <ksp_view_pre.txt><memory_test<wbr>2.txt><memory_test3.txt><petsc<wbr>_options.txt><br>
>>>>>><br>
>>>>><br>
>>>><br>
>>>
<ksp_view1.txt><ksp_view2.txt><wbr><ksp_view3.txt><memory1.txt><m<wbr>emory2.txt><petsc_options1.txt<wbr>><petsc_options2.txt><petsc_op<wbr>tions3.txt><br>
><br>
<br>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
<div> </div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</blockquote>
<div> </div>
<div> </div>
<div> </div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="gmail_signature" data-smartmail="gmail_signature">What
most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results
to which their experiments lead.<br>
-- Norbert Wiener</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>