<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Hi, <br>
</p>
<p>On 10/04/2016 11:24 AM, Matthew Knepley wrote:<br>
</p>
<blockquote
cite="mid:CAMYG4Gn6A6dZn1vtJZTMog+fN5PBTUZK3XoBwCC0SWfaUbpQXg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Tue, Oct 4, 2016 at 1:13 PM, frank
<span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hengjiew@uci.edu" target="_blank">hengjiew@uci.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Hi,</p>
This question is follow-up of the thread "Question about
memory usage in Multigrid preconditioner".<br>
I used to have the "Out of Memory(OOM)" problem when
using the CG+Telescope MG solver with 32768 cores.
Adding the "-matrap 0; -matptap_scalable" option did
solve that problem. <br>
<br>
Then I test the scalability by solving a 3d poisson eqn
for 1 step. I used one sub-communicator in all the
tests. The difference between the petsc options in those
tests are: 1 the pc_telescope_reduction_factor; 2 the
number of multigrid levels in the up/down solver. The
function "ksp_solve" is timed. It is kind of slow and
doesn't scale at all. <br>
</div>
</blockquote>
<div><br>
</div>
<div>1) The number of levels cannot be different in the
up/down smoothers. Why are you using a / ?</div>
</div>
</div>
</div>
</blockquote>
<br>
I didn't mean the "up/down smoothers". I mean the "-pc_mg_levels"
and "-mg_coarse_telescope_pc_mg_levels".
<blockquote
cite="mid:CAMYG4Gn6A6dZn1vtJZTMog+fN5PBTUZK3XoBwCC0SWfaUbpQXg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>2) We need to see what solver you actually constructed,
so give us the output of -ksp_view</div>
<div><br>
</div>
<div>3) For any performance questions, we need the output of
-log_view</div>
</div>
</div>
</div>
</blockquote>
<br>
I attached the log_view's ouput for all the eight runs. <br>
The file is named by the cores# and the grid size. Ex,
log_512_4096.txt is log_view from the case using 512^3 grid points
and 4096 cores.<br>
<br>
I attach two ksp_view's output, just in case too many file become
messy. The ksp_view for the other tests are quite similar. The only
difference is the number of MG levels.<br>
<blockquote
cite="mid:CAMYG4Gn6A6dZn1vtJZTMog+fN5PBTUZK3XoBwCC0SWfaUbpQXg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>4) It looks like you are fixing the number of levels as
you scale up. This makes the coarse problem much bigger,
and is not a scalable way to proceed.</div>
<div> Have you looked at the ratio of coarse grid time to
level time?</div>
</div>
</div>
</div>
</blockquote>
<br>
How can I find the ratio?<br>
<blockquote
cite="mid:CAMYG4Gn6A6dZn1vtJZTMog+fN5PBTUZK3XoBwCC0SWfaUbpQXg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>5) Did you look at the options in this paper: <a
moz-do-not-send="true"
href="https://arxiv.org/abs/1604.07163">https://arxiv.org/abs/1604.07163</a></div>
</div>
</div>
</div>
</blockquote>
<br>
I am going to look at it now <br>
<br>
Thank you.<br>
Frank<br>
<br>
<blockquote
cite="mid:CAMYG4Gn6A6dZn1vtJZTMog+fN5PBTUZK3XoBwCC0SWfaUbpQXg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> Test1: 512^3 grid points<br>
Core# telescope_reduction_factor <wbr> MG
levels# for up/down solver Time for KSPSolve (s)<br>
512 8 <wbr>
4 / 3 <wbr>
6.2466<br>
4096 64 <wbr>
5 / 3 <wbr>
0.9361<br>
32768 64 <wbr>
4 / 3 <wbr>
4.8914<br>
<br>
Test2: 1024^3 grid points<br>
Core# telescope_reduction_factor <wbr> MG
levels# for up/down solver Time for KSPSolve (s)<br>
4096 64 <wbr>
5 / 4 <wbr>
3.4139<br>
8192 128 <wbr>
5 / 4 <wbr>
2.4196<br>
16384 32 <wbr>
5 / 3 <wbr>
5.4150<br>
32768 64 <wbr>
5 / 3 <wbr>
5.6067<br>
65536 128 <wbr>
5 / 3 <wbr>
6.5219<br>
<br>
I guess I didn't set the MG levels properly. What would
be the efficient way to arrange the MG levels?<br>
Also which preconditionr at the coarse mesh of the 2nd
communicator should I use to improve the performance? <br>
<br>
I attached the test code and the petsc options file for
the 1024^3 cube with 32768 cores. <br>
<br>
Thank you.<br>
<br>
Regards,<br>
Frank<br>
<br>
<br>
<br>
<br>
<br>
<br>
<div class="gmail-m_5791256141221066626moz-cite-prefix">On
09/15/2016 03:35 AM, Dave May wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>HI all,<br>
<br>
</div>
<div>I the only unexpected memory usage I
can see is associated with the call to
MatPtAP().<br>
</div>
<div>Here is something you can try
immediately.<br>
</div>
</div>
Run your code with the additional options<br>
-matrap 0 -matptap_scalable<br>
<br>
</div>
<div>I didn't realize this before, but the
default behaviour of MatPtAP in parallel is
actually to to explicitly form the transpose
of P (e.g. assemble R = P^T) and then compute
R.A.P. <br>
You don't want to do this. The option -matrap
0 resolves this issue.<br>
</div>
<div><br>
</div>
<div>The implementation of P^T.A.P has two
variants. <br>
The scalable implementation (with respect to
memory usage) is selected via the second
option -matptap_scalable.</div>
<div><br>
</div>
<div>Try it out - I see a significant memory
reduction using these options for particular
mesh sizes / partitions.<br>
</div>
<div><br>
</div>
I've attached a cleaned up version of the code
you sent me.<br>
</div>
There were a number of memory leaks and other
issues.<br>
</div>
<div>The main points being<br>
</div>
* You should call DMDAVecGetArrayF90() before
VecAssembly{Begin,End}<br>
* You should call PetscFinalize(), otherwise the
option -log_summary (-log_view) will not display
anything once the program has completed.<br>
<div>
<div>
<div><br>
<br>
</div>
<div>Thanks,<br>
</div>
<div> Dave<br>
</div>
<div>
<div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 15 September 2016 at
08:03, Hengjie Wang <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:hengjiew@uci.edu" target="_blank">hengjiew@uci.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> Hi Dave,<br>
<br>
Sorry, I should have put more comment to
explain the code. <br>
The number of process in each dimension is the
same: Px = Py=Pz=P. So is the domain size.<br>
So if the you want to run the code for a
512^3 grid points on 16^3 cores, you need to
set "-N 512 -P 16" in the command line.<br>
I add more comments and also fix an error in
the attached code. ( The error only effects
the accuracy of solution but not the memory
usage. ) <br>
<div><br>
Thank you.<span
class="gmail-m_5791256141221066626HOEnZb"><font
color="#888888"><br>
Frank</font></span>
<div>
<div class="gmail-m_5791256141221066626h5"><br>
<br>
On 9/14/2016 9:05 PM, Dave May wrote:<br>
</div>
</div>
</div>
<div>
<div class="gmail-m_5791256141221066626h5">
<blockquote type="cite"><br>
<br>
On Thursday, 15 September 2016, Dave May
<<a moz-do-not-send="true"
href="mailto:dave.mayhem23@gmail.com"
target="_blank">dave.mayhem23@gmail.com</a>>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><br>
<br>
On Thursday, 15 September 2016, frank
<<a moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> Hi, <br>
<br>
I write a simple code to
re-produce the error. I hope this
can help to diagnose the problem.<br>
The code just solves a 3d poisson
equation. </div>
</blockquote>
<div><br>
</div>
<div>Why is the stencil width a
runtime parameter?? And why is the
default value 2? For 7-pnt FD
Laplace, you only need a stencil
width of 1. </div>
<div><br>
</div>
<div>Was this choice made to mimic
something in the real application
code?</div>
</blockquote>
<div><br>
</div>
Please ignore - I misunderstood your
usage of the param set by -P
<div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><br>
I run the code on a 1024^3 mesh.
The process partition is 32 * 32
* 32. That's when I re-produce
the OOM error. Each core has
about 2G memory.<br>
I also run the code on a 512^3
mesh with 16 * 16 * 16
processes. The ksp solver works
fine. <br>
I attached the code,
ksp_view_pre's output and my
petsc option file.<br>
<br>
Thank you.<br>
Frank<br>
<div><br>
On 09/09/2016 06:38 PM,
Hengjie Wang wrote:<br>
</div>
<blockquote type="cite">Hi
Barry,
<div><br>
</div>
<div>I checked. On the
supercomputer, I had the
option "-ksp_view_pre" but
it is not in file I sent
you. I am sorry for the
confusion.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Frank<span></span><br>
<br>
On Friday, September 9,
2016, Barry Smith <<a
moz-do-not-send="true">bsmith@mcs.anl.gov</a>>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex"><br>
> On Sep 9, 2016, at
3:11 PM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
><br>
> Hi Barry,<br>
><br>
> I think the first KSP
view output is from
-ksp_view_pre. Before I
submitted the test, I was
not sure whether there
would be OOM error or not.
So I added both
-ksp_view_pre and
-ksp_view.<br>
<br>
But the options file you
sent specifically does NOT
list the -ksp_view_pre so
how could it be from that?<br>
<br>
Sorry to be pedantic
but I've spent too much
time in the past trying to
debug from incorrect
information and want to
make sure that the
information I have is
correct before thinking.
Please recheck exactly
what happened. Rerun with
the exact input file you
emailed if that is needed.<br>
<br>
Barry<br>
<br>
><br>
> Frank<br>
><br>
><br>
> On 09/09/2016 12:38
PM, Barry Smith wrote:<br>
>> Why does
ksp_view2.txt have two KSP
views in it while
ksp_view1.txt has only one
KSPView in it? Did you run
two different solves in
the 2 case but not the
one?<br>
>><br>
>> Barry<br>
>><br>
>><br>
>><br>
>>> On Sep 9,
2016, at 10:56 AM, frank
<<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>><br>
>>> Hi,<br>
>>><br>
>>> I want to
continue digging into the
memory problem here.<br>
>>> I did find a
work around in the past,
which is to use less cores
per node so that each core
has 8G memory. However
this is deficient and
expensive. I hope to
locate the place that uses
the most memory.<br>
>>><br>
>>> Here is a
brief summary of the tests
I did in past:<br>
>>>> Test1:
Mesh 1536*128*384 |
Process Mesh 48*4*12<br>
>>> Maximum (over
computational time)
process memory:
total 7.0727e+08<br>
>>> Current
process memory:
total
7.0727e+08<br>
>>> Maximum (over
computational time) space
PetscMalloc()ed: total
6.3908e+11<br>
>>> Current space
PetscMalloc()ed:
total
1.8275e+09<br>
>>><br>
>>>> Test2:
Mesh 1536*128*384 |
Process Mesh 96*8*24<br>
>>> Maximum (over
computational time)
process memory:
total 5.9431e+09<br>
>>> Current
process memory:
total
5.9431e+09<br>
>>> Maximum (over
computational time) space
PetscMalloc()ed: total
5.3202e+12<br>
>>> Current space
PetscMalloc()ed:
total
5.4844e+09<br>
>>><br>
>>>> Test3:
Mesh 3072*256*768 |
Process Mesh 96*8*24<br>
>>> OOM( Out
Of Memory ) killer of the
supercomputer terminated
the job during "KSPSolve".<br>
>>><br>
>>> I attached
the output of ksp_view(
the third test's output is
from ksp_view_pre ),
memory_view and also the
petsc options.<br>
>>><br>
>>> In all the
tests, each core can
access about 2G memory. In
test3, there are
4223139840 non-zeros in
the matrix. This will
consume about 1.74M, using
double precision.
Considering some extra
memory used to store
integer index, 2G memory
should still be way
enough.<br>
>>><br>
>>> Is there a
way to find out which part
of KSPSolve uses the most
memory?<br>
>>> Thank you so
much.<br>
>>><br>
>>> BTW, there
are 4 options remains
unused and I don't
understand why they are
omitted:<br>
>>>
-mg_coarse_telescope_mg_coarse<wbr>_ksp_type
value: preonly<br>
>>>
-mg_coarse_telescope_mg_coarse<wbr>_pc_type
value: bjacobi<br>
>>>
-mg_coarse_telescope_mg_levels<wbr>_ksp_max_it
value: 1<br>
>>>
-mg_coarse_telescope_mg_levels<wbr>_ksp_type
value: richardson<br>
>>><br>
>>><br>
>>> Regards,<br>
>>> Frank<br>
>>><br>
>>> On 07/13/2016
05:47 PM, Dave May wrote:<br>
>>>><br>
>>>> On 14
July 2016 at 01:07, frank
<<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>> Hi Dave,<br>
>>>><br>
>>>> Sorry for
the late reply.<br>
>>>> Thank you
so much for your detailed
reply.<br>
>>>><br>
>>>> I have a
question about the
estimation of the memory
usage. There are
4223139840 allocated
non-zeros and 18432 MPI
processes. Double
precision is used. So the
memory per process is:<br>
>>>>
4223139840 * 8bytes /
18432 / 1024 / 1024 =
1.74M ?<br>
>>>> Did I do
sth wrong here? Because
this seems too small.<br>
>>>><br>
>>>> No - I
totally f***ed it up. You
are correct. That'll teach
me for fumbling around
with my iphone calculator
and not using my brain.
(Note that to convert to
MB just divide by 1e6, not
1024^2 - although I
apparently cannot convert
between units
correctly....)<br>
>>>><br>
>>>> From the
PETSc objects associated
with the solver, It looks
like it _should_ run with
2GB per MPI rank. Sorry
for my mistake.
Possibilities are:
somewhere in your usage of
PETSc you've introduced a
memory leak; PETSc is
doing a huge over
allocation (e.g. as per
our discussion of
MatPtAP); or in your
application code there are
other objects you have
forgotten to log the
memory for.<br>
>>>><br>
>>>><br>
>>>><br>
>>>> I am
running this job on
Bluewater<br>
>>>> I am
using the 7 points FD
stencil in 3D.<br>
>>>><br>
>>>> I thought
so on both counts.<br>
>>>><br>
>>>> I
apologize that I made a
stupid mistake in
computing the memory per
core. My settings render
each core can access only
2G memory on average
instead of 8G which I
mentioned in previous
email. I re-run the job
with 8G memory per core on
average and there is no
"Out Of Memory" error. I
would do more test to see
if there is still some
memory issue.<br>
>>>><br>
>>>> Ok. I'd
still like to know where
the memory was being used
since my estimates were
off.<br>
>>>><br>
>>>><br>
>>>> Thanks,<br>
>>>> Dave<br>
>>>><br>
>>>> Regards,<br>
>>>> Frank<br>
>>>><br>
>>>><br>
>>>><br>
>>>> On
07/11/2016 01:18 PM, Dave
May wrote:<br>
>>>>> Hi
Frank,<br>
>>>>><br>
>>>>><br>
>>>>> On 11
July 2016 at 19:14, frank
<<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>> Hi
Dave,<br>
>>>>><br>
>>>>> I
re-run the test using
bjacobi as the
preconditioner on the
coarse mesh of telescope.
The Grid is 3072*256*768
and process mesh is
96*8*24. The petsc option
file is attached.<br>
>>>>> I
still got the "Out Of
Memory" error. The error
occurred before the linear
solver finished one step.
So I don't have the full
info from ksp_view. The
info from ksp_view_pre is
attached.<br>
>>>>><br>
>>>>> Okay
- that is essentially
useless (sorry)<br>
>>>>><br>
>>>>> It
seems to me that the error
occurred when the
decomposition was going to
be changed.<br>
>>>>><br>
>>>>> Based
on what information?<br>
>>>>>
Running with -info would
give us more clues, but
will create a ton of
output.<br>
>>>>>
Please try running the
case which failed with
-info<br>
>>>>> I
had another test with a
grid of 1536*128*384 and
the same process mesh as
above. There was no error.
The ksp_view info is
attached for comparison.<br>
>>>>> Thank
you.<br>
>>>>><br>
>>>>><br>
>>>>> [3]
Here is my crude estimate
of your memory usage.<br>
>>>>> I'll
target the biggest memory
hogs only to get an order
of magnitude estimate<br>
>>>>><br>
>>>>> * The
Fine grid operator
contains 4223139840
non-zeros --> 1.8 GB
per MPI rank assuming
double precision.<br>
>>>>> The
indices for the AIJ could
amount to another 0.3 GB
(assuming 32 bit integers)<br>
>>>>><br>
>>>>> * You
use 5 levels of
coarsening, so the other
operators should represent
(collectively)<br>
>>>>> 2.1 /
8 + 2.1/8^2 + 2.1/8^3 +
2.1/8^4 ~ 300 MB per MPI
rank on the communicator
with 18432 ranks.<br>
>>>>> The
coarse grid should consume
~ 0.5 MB per MPI rank on
the communicator with
18432 ranks.<br>
>>>>><br>
>>>>> * You
use a reduction factor of
64, making the new
communicator with 288 MPI
ranks.<br>
>>>>>
PCTelescope will first
gather a temporary matrix
associated with your
coarse level operator
assuming a comm size of
288 living on the comm
with size 18432.<br>
>>>>> This
matrix will require
approximately 0.5 * 64 =
32 MB per core on the 288
ranks.<br>
>>>>> This
matrix is then used to
form a new MPIAIJ matrix
on the subcomm, thus
require another 32 MB per
rank.<br>
>>>>> The
temporary matrix is now
destroyed.<br>
>>>>><br>
>>>>> *
Because a DMDA is
detected, a permutation
matrix is assembled.<br>
>>>>> This
requires 2 doubles per
point in the DMDA.<br>
>>>>> Your
coarse DMDA contains 92 x
16 x 48 points.<br>
>>>>> Thus
the permutation matrix
will require < 1 MB per
MPI rank on the sub-comm.<br>
>>>>><br>
>>>>> *
Lastly, the matrix is
permuted. This uses
MatPtAP(), but the
resulting operator will
have the same memory
footprint as the
unpermuted matrix (32 MB).
At any stage in
PCTelescope, only 2
operators of size 32 MB
are held in memory when
the DMDA is provided.<br>
>>>>><br>
>>>>> From
my rough estimates, the
worst case memory foot
print for any given core,
given your options is
approximately<br>
>>>>> 2100
MB + 300 MB + 32 MB + 32
MB + 1 MB = 2465 MB<br>
>>>>> This
is way below 8 GB.<br>
>>>>><br>
>>>>> Note
this estimate completely
ignores:<br>
>>>>> (1)
the memory required for
the restriction operator,<br>
>>>>> (2)
the potential growth in
the number of non-zeros
per row due to Galerkin
coarsening (I wished
-ksp_view_pre reported the
output from MatView so we
could see the number of
non-zeros required by the
coarse level operators)<br>
>>>>> (3)
all temporary vectors
required by the CG solver,
and those required by the
smoothers.<br>
>>>>> (4)
internal memory allocated
by MatPtAP<br>
>>>>> (5)
memory associated with
IS's used within
PCTelescope<br>
>>>>><br>
>>>>> So
either I am completely off
in my estimates, or you
have not carefully
estimated the memory usage
of your application code.
Hopefully others might
examine/correct my rough
estimates<br>
>>>>><br>
>>>>> Since
I don't have your code I
cannot access the latter.<br>
>>>>> Since
I don't have access to the
same machine you are
running on, I think we
need to take a step back.<br>
>>>>><br>
>>>>> [1]
What machine are you
running on? Send me a URL
if its available<br>
>>>>><br>
>>>>> [2]
What discretization are
you using? (I am guessing
a scalar 7 point FD
stencil)<br>
>>>>> If
it's a 7 point FD stencil,
we should be able to
examine the memory usage
of your solver
configuration using a
standard, light weight
existing PETSc example,
run on your machine at the
same scale.<br>
>>>>> This
would hopefully enable us
to correctly evaluate the
actual memory usage
required by the solver
configuration you are
using.<br>
>>>>><br>
>>>>>
Thanks,<br>
>>>>>
Dave<br>
>>>>><br>
>>>>><br>
>>>>> Frank<br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>>> On
07/08/2016 10:38 PM, Dave
May wrote:<br>
>>>>>><br>
>>>>>>
On Saturday, 9 July 2016,
frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>>>
Hi Barry and Dave,<br>
>>>>>><br>
>>>>>>
Thank both of you for the
advice.<br>
>>>>>><br>
>>>>>>
@Barry<br>
>>>>>> I
made a mistake in the file
names in last email. I
attached the correct files
this time.<br>
>>>>>>
For all the three tests,
'Telescope' is used as the
coarse preconditioner.<br>
>>>>>><br>
>>>>>>
== Test1: Grid:
1536*128*384, Process
Mesh: 48*4*12<br>
>>>>>>
Part of the memory usage:
Vector 125
124 3971904 0.<br>
>>>>>>
Matrix
101 101 9462372
0<br>
>>>>>><br>
>>>>>>
== Test2: Grid:
1536*128*384, Process
Mesh: 96*8*24<br>
>>>>>>
Part of the memory usage:
Vector 125
124 681672 0.<br>
>>>>>>
Matrix
101 101 1462180
0.<br>
>>>>>><br>
>>>>>>
In theory, the memory
usage in Test1 should be 8
times of Test2. In my
case, it is about 6 times.<br>
>>>>>><br>
>>>>>>
== Test3: Grid:
3072*256*768, Process
Mesh: 96*8*24. Sub-domain
per process: 32*32*32<br>
>>>>>>
Here I get the out of
memory error.<br>
>>>>>><br>
>>>>>> I
tried to use -mg_coarse
jacobi. In this way, I
don't need to set
-mg_coarse_ksp_type and
-mg_coarse_pc_type
explicitly, right?<br>
>>>>>>
The linear solver didn't
work in this case. Petsc
output some errors.<br>
>>>>>><br>
>>>>>>
@Dave<br>
>>>>>>
In test3, I use only one
instance of 'Telescope'.
On the coarse mesh of
'Telescope', I used LU as
the preconditioner instead
of SVD.<br>
>>>>>>
If my set the levels
correctly, then on the
last coarse mesh of MG
where it calls
'Telescope', the
sub-domain per process is
2*2*2.<br>
>>>>>>
On the last coarse mesh of
'Telescope', there is only
one grid point per
process.<br>
>>>>>> I
still got the OOM error.
The detailed petsc option
file is attached.<br>
>>>>>><br>
>>>>>>
Do you understand the
expected memory usage for
the particular parallel LU
implementation you are
using? I don't
(seriously). Replace LU
with bjacobi and re-run
this test. My point about
solver debugging is still
valid.<br>
>>>>>><br>
>>>>>>
And please send the result
of KSPView so we can see
what is actually used in
the computations<br>
>>>>>><br>
>>>>>>
Thanks<br>
>>>>>>
Dave<br>
>>>>>><br>
>>>>>><br>
>>>>>>
Thank you so much.<br>
>>>>>><br>
>>>>>>
Frank<br>
>>>>>><br>
>>>>>><br>
>>>>>><br>
>>>>>>
On 07/06/2016 02:51 PM,
Barry Smith wrote:<br>
>>>>>>
On Jul 6, 2016, at 4:19
PM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>>><br>
>>>>>>
Hi Barry,<br>
>>>>>><br>
>>>>>>
Thank you for you advice.<br>
>>>>>> I
tried three test. In the
1st test, the grid is
3072*256*768 and the
process mesh is 96*8*24.<br>
>>>>>>
The linear solver is 'cg'
the preconditioner is 'mg'
and 'telescope' is used as
the preconditioner at the
coarse mesh.<br>
>>>>>>
The system gives me the
"Out of Memory" error
before the linear system
is completely solved.<br>
>>>>>>
The info from
'-ksp_view_pre' is
attached. I seems to me
that the error occurs when
it reaches the coarse
mesh.<br>
>>>>>><br>
>>>>>>
The 2nd test uses a grid
of 1536*128*384 and
process mesh is 96*8*24.
The 3rd
test uses the same grid
but a different process
mesh 48*4*12.<br>
>>>>>>
Are you sure this is
right? The total matrix
and vector memory usage
goes from 2nd test<br>
>>>>>>
Vector
384 383
8,193,712 0.<br>
>>>>>>
Matrix
103 103
11,508,688 0.<br>
>>>>>>
to 3rd test<br>
>>>>>>
Vector 384
383
1,590,520 0.<br>
>>>>>>
Matrix
103 103
3,508,664 0.<br>
>>>>>>
that is the memory usage
got smaller but if you
have only 1/8th the
processes and the same
grid it should have gotten
about 8 times bigger. Did
you maybe cut the grid by
a factor of 8 also? If so
that still doesn't explain
it because the memory
usage changed by a factor
of 5 something for the
vectors and 3 something
for the matrices.<br>
>>>>>><br>
>>>>>><br>
>>>>>>
The linear solver and
petsc options in 2nd and
3rd tests are the same in
1st test. The linear
solver works fine in both
test.<br>
>>>>>> I
attached the memory usage
of the 2nd and 3rd tests.
The memory info is from
the option '-log_summary'.
I tried to use
'-momery_info' as you
suggested, but in my case
petsc treated it as an
unused option. It output
nothing about the memory.
Do I need to add sth to my
code so I can use
'-memory_info'?<br>
>>>>>>
Sorry, my mistake the
option is -memory_view<br>
>>>>>><br>
>>>>>>
Can you run the one case
with -memory_view and
-mg_coarse jacobi
-ksp_max_it 1 (just so it
doesn't iterate forever)
to see how much memory is
used without the
telescope? Also run case 2
the same way.<br>
>>>>>><br>
>>>>>>
Barry<br>
>>>>>><br>
>>>>>><br>
>>>>>><br>
>>>>>>
In both tests the memory
usage is not large.<br>
>>>>>><br>
>>>>>>
It seems to me that it
might be the 'telescope'
preconditioner that
allocated a lot of memory
and caused the error in
the 1st test.<br>
>>>>>>
Is there is a way to show
how much memory it
allocated?<br>
>>>>>><br>
>>>>>>
Frank<br>
>>>>>><br>
>>>>>>
On 07/05/2016 03:37 PM,
Barry Smith wrote:<br>
>>>>>>
Frank,<br>
>>>>>><br>
>>>>>>
You can run with
-ksp_view_pre to have it
"view" the KSP before the
solve so hopefully it gets
that far.<br>
>>>>>><br>
>>>>>>
Please run the
problem that does fit with
-memory_info when the
problem completes it will
show the "high water mark"
for PETSc allocated memory
and total memory used. We
first want to look at
these numbers to see if it
is using more memory than
you expect. You could also
run with say half the grid
spacing to see how the
memory usage scaled with
the increase in grid
points. Make the runs also
with -log_view and send
all the output from these
options.<br>
>>>>>><br>
>>>>>>
Barry<br>
>>>>>><br>
>>>>>>
On Jul 5, 2016, at 5:23
PM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>>><br>
>>>>>>
Hi,<br>
>>>>>><br>
>>>>>> I
am using the CG ksp solver
and Multigrid
preconditioner to solve a
linear system in parallel.<br>
>>>>>> I
chose to use the
'Telescope' as the
preconditioner on the
coarse mesh for its good
performance.<br>
>>>>>>
The petsc options file is
attached.<br>
>>>>>><br>
>>>>>>
The domain is a 3d box.<br>
>>>>>>
It works well when the
grid is 1536*128*384 and
the process mesh is
96*8*24. When I double the
size of grid and
keep the same
process mesh and petsc
options, I get an "out of
memory" error from the
super-cluster I am using.<br>
>>>>>>
Each process has access to
at least 8G memory, which
should be more than enough
for my application. I am
sure that all the other
parts of my code( except
the linear solver ) do not
use much memory. So I
doubt if there is
something wrong with the
linear solver.<br>
>>>>>>
The error occurs before
the linear system is
completely solved so I
don't have the info from
ksp view. I am not able to
re-produce the error with
a smaller problem either.<br>
>>>>>>
In addition, I tried to
use the block jacobi as
the preconditioner with
the same grid and same
decomposition. The linear
solver runs extremely slow
but there is no memory
error.<br>
>>>>>><br>
>>>>>>
How can I diagnose what
exactly cause the error?<br>
>>>>>>
Thank you so much.<br>
>>>>>><br>
>>>>>>
Frank<br>
>>>>>>
<petsc_options.txt><br>
>>>>>>
<ksp_view_pre.txt><memory_test<wbr>2.txt><memory_test3.txt><petsc<wbr>_options.txt><br>
>>>>>><br>
>>>>><br>
>>>><br>
>>>
<ksp_view1.txt><ksp_view2.txt><wbr><ksp_view3.txt><memory1.txt><m<wbr>emory2.txt><petsc_options1.txt<wbr>><petsc_options2.txt><petsc_op<wbr>tions3.txt><br>
><br>
<br>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
<div> </div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="gmail_signature">What most experimenters take for
granted before they begin their experiments is infinitely
more interesting than any results to which their experiments
lead.<br>
-- Norbert Wiener</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>