<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Hi,</p>
This question is follow-up of the thread "Question about memory
usage in Multigrid preconditioner".<br>
I used to have the "Out of Memory(OOM)" problem when using the
CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0;
-matptap_scalable" option did solve that problem. <br>
<br>
Then I test the scalability by solving a 3d poisson eqn for 1 step.
I used one sub-communicator in all the tests. The difference between
the petsc options in those tests are: 1 the
pc_telescope_reduction_factor; 2 the number of multigrid levels in
the up/down solver. The function "ksp_solve" is timed. It is kind of
slow and doesn't scale at all. <br>
<br>
Test1: 512^3 grid points<br>
Core# telescope_reduction_factor MG levels# for
up/down solver Time for KSPSolve (s)<br>
512 8 4
/ 3 6.2466<br>
4096 64 5 /
3 0.9361<br>
32768 64 4 /
3 4.8914<br>
<br>
Test2: 1024^3 grid points<br>
Core# telescope_reduction_factor MG levels# for
up/down solver Time for KSPSolve (s)<br>
4096 64 5 /
4 3.4139<br>
8192 128 5 /
4 2.4196<br>
16384 32 5 / 3
5.4150<br>
32768 64 5 /
3 5.6067<br>
65536 128 5 /
3 6.5219<br>
<br>
I guess I didn't set the MG levels properly. What would be the
efficient way to arrange the MG levels?<br>
Also which preconditionr at the coarse mesh of the 2nd communicator
should I use to improve the performance? <br>
<br>
I attached the test code and the petsc options file for the 1024^3
cube with 32768 cores. <br>
<br>
Thank you.<br>
<br>
Regards,<br>
Frank<br>
<br>
<br>
<br>
<br>
<br>
<br>
<div class="moz-cite-prefix">On 09/15/2016 03:35 AM, Dave May wrote:<br>
</div>
<blockquote
cite="mid:CAJ98EDpYAVvyJQW3bk_QaiJLQhmEgGn6rz8LYPDDodAh1oErcA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>HI all,<br>
<br>
</div>
<div>I the only unexpected memory usage I can see is
associated with the call to MatPtAP().<br>
</div>
<div>Here is something you can try immediately.<br>
</div>
</div>
Run your code with the additional options<br>
-matrap 0 -matptap_scalable<br>
<br>
</div>
<div>I didn't realize this before, but the default behaviour
of MatPtAP in parallel is actually to to explicitly form
the transpose of P (e.g. assemble R = P^T) and then
compute R.A.P. <br>
You don't want to do this. The option -matrap 0 resolves
this issue.<br>
</div>
<div><br>
</div>
<div>The implementation of P^T.A.P has two variants. <br>
The scalable implementation (with respect to memory usage)
is selected via the second option -matptap_scalable.</div>
<div><br>
</div>
<div>Try it out - I see a significant memory reduction using
these options for particular mesh sizes / partitions.<br>
</div>
<div><br>
</div>
I've attached a cleaned up version of the code you sent me.<br>
</div>
There were a number of memory leaks and other issues.<br>
</div>
<div>The main points being<br>
</div>
* You should call DMDAVecGetArrayF90() before
VecAssembly{Begin,End}<br>
* You should call PetscFinalize(), otherwise the option
-log_summary (-log_view) will not display anything once the
program has completed.<br>
<div>
<div>
<div><br>
<br>
</div>
<div>Thanks,<br>
</div>
<div> Dave<br>
</div>
<div>
<div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 15 September 2016 at 08:03, Hengjie
Wang <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hengjiew@uci.edu" target="_blank">hengjiew@uci.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Hi Dave,<br>
<br>
Sorry, I should have put more comment to explain the
code. <br>
The number of process in each dimension is the same: Px =
Py=Pz=P. So is the domain size.<br>
So if the you want to run the code for a 512^3 grid
points on 16^3 cores, you need to set "-N 512 -P 16" in
the command line.<br>
I add more comments and also fix an error in the attached
code. ( The error only effects the accuracy of solution
but not the memory usage. ) <br>
<div><br>
Thank you.<span class="HOEnZb"><font color="#888888"><br>
Frank</font></span>
<div>
<div class="h5"><br>
<br>
On 9/14/2016 9:05 PM, Dave May wrote:<br>
</div>
</div>
</div>
<div>
<div class="h5">
<blockquote type="cite"><br>
<br>
On Thursday, 15 September 2016, Dave May <<a
moz-do-not-send="true"
href="mailto:dave.mayhem23@gmail.com"
target="_blank">dave.mayhem23@gmail.com</a>>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
On Thursday, 15 September 2016, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Hi, <br>
<br>
I write a simple code to re-produce the error.
I hope this can help to diagnose the problem.<br>
The code just solves a 3d poisson equation. </div>
</blockquote>
<div><br>
</div>
<div>Why is the stencil width a runtime
parameter?? And why is the default value 2? For
7-pnt FD Laplace, you only need a stencil width
of 1. </div>
<div><br>
</div>
<div>Was this choice made to mimic something in
the real application code?</div>
</blockquote>
<div><br>
</div>
Please ignore - I misunderstood your usage of the
param set by -P
<div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div> </div>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><br>
I run the code on a 1024^3 mesh. The process
partition is 32 * 32 * 32. That's when I
re-produce the OOM error. Each core has
about 2G memory.<br>
I also run the code on a 512^3 mesh with 16
* 16 * 16 processes. The ksp solver works
fine. <br>
I attached the code, ksp_view_pre's output
and my petsc option file.<br>
<br>
Thank you.<br>
Frank<br>
<div><br>
On 09/09/2016 06:38 PM, Hengjie Wang
wrote:<br>
</div>
<blockquote type="cite">Hi Barry,
<div><br>
</div>
<div>I checked. On the supercomputer, I
had the option "-ksp_view_pre" but it is
not in file I sent you. I am sorry for
the confusion.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Frank<span></span><br>
<br>
On Friday, September 9, 2016, Barry
Smith <<a moz-do-not-send="true">bsmith@mcs.anl.gov</a>>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"><br>
> On Sep 9, 2016, at 3:11 PM, frank
<<a moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
><br>
> Hi Barry,<br>
><br>
> I think the first KSP view output
is from -ksp_view_pre. Before I
submitted the test, I was not sure
whether there would be OOM error or
not. So I added both -ksp_view_pre and
-ksp_view.<br>
<br>
But the options file you sent
specifically does NOT list the
-ksp_view_pre so how could it be from
that?<br>
<br>
Sorry to be pedantic but I've spent
too much time in the past trying to
debug from incorrect information and
want to make sure that the information
I have is correct before thinking.
Please recheck exactly what happened.
Rerun with the exact input file you
emailed if that is needed.<br>
<br>
Barry<br>
<br>
><br>
> Frank<br>
><br>
><br>
> On 09/09/2016 12:38 PM, Barry
Smith wrote:<br>
>> Why does ksp_view2.txt have
two KSP views in it while
ksp_view1.txt has only one KSPView in
it? Did you run two different solves
in the 2 case but not the one?<br>
>><br>
>> Barry<br>
>><br>
>><br>
>><br>
>>> On Sep 9, 2016, at 10:56
AM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>><br>
>>> Hi,<br>
>>><br>
>>> I want to continue
digging into the memory problem here.<br>
>>> I did find a work around
in the past, which is to use less
cores per node so that each core has
8G memory. However this is deficient
and expensive. I hope to locate the
place that uses the most memory.<br>
>>><br>
>>> Here is a brief summary
of the tests I did in past:<br>
>>>> Test1: Mesh
1536*128*384 | Process Mesh 48*4*12<br>
>>> Maximum (over
computational time) process memory:
total 7.0727e+08<br>
>>> Current process memory:
total 7.0727e+08<br>
>>> Maximum (over
computational time) space
PetscMalloc()ed: total 6.3908e+11<br>
>>> Current space
PetscMalloc()ed:
total
1.8275e+09<br>
>>><br>
>>>> Test2: Mesh
1536*128*384 | Process Mesh 96*8*24<br>
>>> Maximum (over
computational time) process memory:
total 5.9431e+09<br>
>>> Current process memory:
total 5.9431e+09<br>
>>> Maximum (over
computational time) space
PetscMalloc()ed: total 5.3202e+12<br>
>>> Current space
PetscMalloc()ed:
total
5.4844e+09<br>
>>><br>
>>>> Test3: Mesh
3072*256*768 | Process Mesh 96*8*24<br>
>>> OOM( Out Of Memory )
killer of the supercomputer terminated
the job during "KSPSolve".<br>
>>><br>
>>> I attached the output of
ksp_view( the third test's output is
from ksp_view_pre ), memory_view and
also the petsc options.<br>
>>><br>
>>> In all the tests, each
core can access about 2G memory. In
test3, there are 4223139840 non-zeros
in the matrix. This will consume about
1.74M, using double precision.
Considering some extra memory used to
store integer index, 2G memory should
still be way enough.<br>
>>><br>
>>> Is there a way to find
out which part of KSPSolve uses the
most memory?<br>
>>> Thank you so much.<br>
>>><br>
>>> BTW, there are 4 options
remains unused and I don't understand
why they are omitted:<br>
>>>
-mg_coarse_telescope_mg_coarse<wbr>_ksp_type
value: preonly<br>
>>>
-mg_coarse_telescope_mg_coarse<wbr>_pc_type
value: bjacobi<br>
>>>
-mg_coarse_telescope_mg_levels<wbr>_ksp_max_it
value: 1<br>
>>>
-mg_coarse_telescope_mg_levels<wbr>_ksp_type
value: richardson<br>
>>><br>
>>><br>
>>> Regards,<br>
>>> Frank<br>
>>><br>
>>> On 07/13/2016 05:47 PM,
Dave May wrote:<br>
>>>><br>
>>>> On 14 July 2016 at
01:07, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>> Hi Dave,<br>
>>>><br>
>>>> Sorry for the late
reply.<br>
>>>> Thank you so much for
your detailed reply.<br>
>>>><br>
>>>> I have a question
about the estimation of the memory
usage. There are 4223139840 allocated
non-zeros and 18432 MPI processes.
Double precision is used. So the
memory per process is:<br>
>>>> 4223139840 * 8bytes
/ 18432 / 1024 / 1024 = 1.74M ?<br>
>>>> Did I do sth wrong
here? Because this seems too small.<br>
>>>><br>
>>>> No - I totally f***ed
it up. You are correct. That'll teach
me for fumbling around with my iphone
calculator and not using my brain.
(Note that to convert to MB just
divide by 1e6, not 1024^2 - although I
apparently cannot convert between
units correctly....)<br>
>>>><br>
>>>> From the PETSc
objects associated with the solver, It
looks like it _should_ run with 2GB
per MPI rank. Sorry for my mistake.
Possibilities are: somewhere in your
usage of PETSc you've introduced a
memory leak; PETSc is doing a huge
over allocation (e.g. as per our
discussion of MatPtAP); or in your
application code there are other
objects you have forgotten to log the
memory for.<br>
>>>><br>
>>>><br>
>>>><br>
>>>> I am running this job
on Bluewater<br>
>>>> I am using the 7
points FD stencil in 3D.<br>
>>>><br>
>>>> I thought so on both
counts.<br>
>>>><br>
>>>> I apologize that I
made a stupid mistake in computing the
memory per core. My settings render
each core can access only 2G memory on
average instead of 8G which I
mentioned in previous email. I re-run
the job with 8G memory per core on
average and there is no "Out Of
Memory" error. I would do more test to
see if there is still some memory
issue.<br>
>>>><br>
>>>> Ok. I'd still like to
know where the memory was being used
since my estimates were off.<br>
>>>><br>
>>>><br>
>>>> Thanks,<br>
>>>> Dave<br>
>>>><br>
>>>> Regards,<br>
>>>> Frank<br>
>>>><br>
>>>><br>
>>>><br>
>>>> On 07/11/2016 01:18
PM, Dave May wrote:<br>
>>>>> Hi Frank,<br>
>>>>><br>
>>>>><br>
>>>>> On 11 July 2016
at 19:14, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>> Hi Dave,<br>
>>>>><br>
>>>>> I re-run the test
using bjacobi as the preconditioner on
the coarse mesh of telescope. The Grid
is 3072*256*768 and process mesh is
96*8*24. The petsc option file is
attached.<br>
>>>>> I still got the
"Out Of Memory" error. The error
occurred before the linear solver
finished one step. So I don't have the
full info from ksp_view. The info from
ksp_view_pre is attached.<br>
>>>>><br>
>>>>> Okay - that is
essentially useless (sorry)<br>
>>>>><br>
>>>>> It seems to me
that the error occurred when the
decomposition was going to be changed.<br>
>>>>><br>
>>>>> Based on what
information?<br>
>>>>> Running with
-info would give us more clues, but
will create a ton of output.<br>
>>>>> Please try
running the case which failed with
-info<br>
>>>>> I had another
test with a grid of 1536*128*384 and
the same process mesh as above. There
was no error. The ksp_view info is
attached for comparison.<br>
>>>>> Thank you.<br>
>>>>><br>
>>>>><br>
>>>>> [3] Here is my
crude estimate of your memory usage.<br>
>>>>> I'll target the
biggest memory hogs only to get an
order of magnitude estimate<br>
>>>>><br>
>>>>> * The Fine grid
operator contains 4223139840 non-zeros
--> 1.8 GB per MPI rank assuming
double precision.<br>
>>>>> The indices for
the AIJ could amount to another 0.3 GB
(assuming 32 bit integers)<br>
>>>>><br>
>>>>> * You use 5
levels of coarsening, so the other
operators should represent
(collectively)<br>
>>>>> 2.1 / 8 + 2.1/8^2
+ 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI
rank on the communicator with 18432
ranks.<br>
>>>>> The coarse grid
should consume ~ 0.5 MB per MPI rank
on the communicator with 18432 ranks.<br>
>>>>><br>
>>>>> * You use a
reduction factor of 64, making the new
communicator with 288 MPI ranks.<br>
>>>>> PCTelescope will
first gather a temporary matrix
associated with your coarse level
operator assuming a comm size of 288
living on the comm with size 18432.<br>
>>>>> This matrix will
require approximately 0.5 * 64 = 32 MB
per core on the 288 ranks.<br>
>>>>> This matrix is
then used to form a new MPIAIJ matrix
on the subcomm, thus require another
32 MB per rank.<br>
>>>>> The temporary
matrix is now destroyed.<br>
>>>>><br>
>>>>> * Because a DMDA
is detected, a permutation matrix is
assembled.<br>
>>>>> This requires 2
doubles per point in the DMDA.<br>
>>>>> Your coarse DMDA
contains 92 x 16 x 48 points.<br>
>>>>> Thus the
permutation matrix will require < 1
MB per MPI rank on the sub-comm.<br>
>>>>><br>
>>>>> * Lastly, the
matrix is permuted. This uses
MatPtAP(), but the resulting operator
will have the same memory footprint as
the unpermuted matrix (32 MB). At any
stage in PCTelescope, only 2 operators
of size 32 MB are held in memory when
the DMDA is provided.<br>
>>>>><br>
>>>>> From my rough
estimates, the worst case memory foot
print for any given core, given your
options is approximately<br>
>>>>> 2100 MB + 300 MB
+ 32 MB + 32 MB + 1 MB = 2465 MB<br>
>>>>> This is way below
8 GB.<br>
>>>>><br>
>>>>> Note this
estimate completely ignores:<br>
>>>>> (1) the memory
required for the restriction operator,<br>
>>>>> (2) the potential
growth in the number of non-zeros per
row due to Galerkin coarsening (I
wished -ksp_view_pre reported the
output from MatView so we could see
the number of non-zeros required by
the coarse level operators)<br>
>>>>> (3) all temporary
vectors required by the CG solver, and
those required by the smoothers.<br>
>>>>> (4) internal
memory allocated by MatPtAP<br>
>>>>> (5) memory
associated with IS's used within
PCTelescope<br>
>>>>><br>
>>>>> So either I am
completely off in my estimates, or you
have not carefully estimated the
memory usage of your application code.
Hopefully others might examine/correct
my rough estimates<br>
>>>>><br>
>>>>> Since I don't
have your code I cannot access the
latter.<br>
>>>>> Since I don't
have access to the same machine you
are running on, I think we need to
take a step back.<br>
>>>>><br>
>>>>> [1] What machine
are you running on? Send me a URL if
its available<br>
>>>>><br>
>>>>> [2] What
discretization are you using? (I am
guessing a scalar 7 point FD stencil)<br>
>>>>> If it's a 7 point
FD stencil, we should be able to
examine the memory usage of your
solver configuration using a standard,
light weight existing PETSc example,
run on your machine at the same scale.<br>
>>>>> This would
hopefully enable us to correctly
evaluate the actual memory usage
required by the solver configuration
you are using.<br>
>>>>><br>
>>>>> Thanks,<br>
>>>>> Dave<br>
>>>>><br>
>>>>><br>
>>>>> Frank<br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>>> On 07/08/2016
10:38 PM, Dave May wrote:<br>
>>>>>><br>
>>>>>> On Saturday,
9 July 2016, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>>> Hi Barry and
Dave,<br>
>>>>>><br>
>>>>>> Thank both of
you for the advice.<br>
>>>>>><br>
>>>>>> @Barry<br>
>>>>>> I made a
mistake in the file names in last
email. I attached the correct files
this time.<br>
>>>>>> For all the
three tests, 'Telescope' is used as
the coarse preconditioner.<br>
>>>>>><br>
>>>>>> == Test1:
Grid: 1536*128*384, Process Mesh:
48*4*12<br>
>>>>>> Part of the
memory usage: Vector 125
124 3971904 0.<br>
>>>>>>
Matrix 101 101 9462372 0<br>
>>>>>><br>
>>>>>> == Test2:
Grid: 1536*128*384, Process Mesh:
96*8*24<br>
>>>>>> Part of the
memory usage: Vector 125
124 681672 0.<br>
>>>>>>
Matrix 101 101 1462180 0.<br>
>>>>>><br>
>>>>>> In theory,
the memory usage in Test1 should be 8
times of Test2. In my case, it is
about 6 times.<br>
>>>>>><br>
>>>>>> == Test3:
Grid: 3072*256*768, Process Mesh:
96*8*24. Sub-domain per process:
32*32*32<br>
>>>>>> Here I get
the out of memory error.<br>
>>>>>><br>
>>>>>> I tried to
use -mg_coarse jacobi. In this way, I
don't need to set -mg_coarse_ksp_type
and -mg_coarse_pc_type explicitly,
right?<br>
>>>>>> The linear
solver didn't work in this case. Petsc
output some errors.<br>
>>>>>><br>
>>>>>> @Dave<br>
>>>>>> In test3, I
use only one instance of 'Telescope'.
On the coarse mesh of 'Telescope', I
used LU as the preconditioner instead
of SVD.<br>
>>>>>> If my set the
levels correctly, then on the last
coarse mesh of MG where it calls
'Telescope', the sub-domain per
process is 2*2*2.<br>
>>>>>> On the last
coarse mesh of 'Telescope', there is
only one grid point per process.<br>
>>>>>> I still got
the OOM error. The detailed petsc
option file is attached.<br>
>>>>>><br>
>>>>>> Do you
understand the expected memory usage
for the particular parallel LU
implementation you are using? I don't
(seriously). Replace LU with bjacobi
and re-run this test. My point about
solver debugging is still valid.<br>
>>>>>><br>
>>>>>> And please
send the result of KSPView so we can
see what is actually used in the
computations<br>
>>>>>><br>
>>>>>> Thanks<br>
>>>>>> Dave<br>
>>>>>><br>
>>>>>><br>
>>>>>> Thank you so
much.<br>
>>>>>><br>
>>>>>> Frank<br>
>>>>>><br>
>>>>>><br>
>>>>>><br>
>>>>>> On 07/06/2016
02:51 PM, Barry Smith wrote:<br>
>>>>>> On Jul 6,
2016, at 4:19 PM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>>><br>
>>>>>> Hi Barry,<br>
>>>>>><br>
>>>>>> Thank you for
you advice.<br>
>>>>>> I tried three
test. In the 1st test, the grid is
3072*256*768 and the process mesh is
96*8*24.<br>
>>>>>> The linear
solver is 'cg' the preconditioner is
'mg' and 'telescope' is used as the
preconditioner at the coarse mesh.<br>
>>>>>> The system
gives me the "Out of Memory" error
before the linear system is completely
solved.<br>
>>>>>> The info from
'-ksp_view_pre' is attached. I seems
to me that the error occurs when it
reaches the coarse mesh.<br>
>>>>>><br>
>>>>>> The 2nd test
uses a grid of 1536*128*384 and
process mesh is 96*8*24. The 3rd
test uses the same grid but a
different process mesh 48*4*12.<br>
>>>>>> Are you
sure this is right? The total matrix
and vector memory usage goes from 2nd
test<br>
>>>>>>
Vector 384 383
8,193,712 0.<br>
>>>>>>
Matrix 103 103
11,508,688 0.<br>
>>>>>> to 3rd test<br>
>>>>>>
Vector 384 383
1,590,520 0.<br>
>>>>>>
Matrix 103 103
3,508,664 0.<br>
>>>>>> that is the
memory usage got smaller but if you
have only 1/8th the processes and the
same grid it should have gotten about
8 times bigger. Did you maybe cut the
grid by a factor of 8 also? If so that
still doesn't explain it because the
memory usage changed by a factor of 5
something for the vectors and 3
something for the matrices.<br>
>>>>>><br>
>>>>>><br>
>>>>>> The linear
solver and petsc options in 2nd and
3rd tests are the same in 1st test.
The linear solver works fine in both
test.<br>
>>>>>> I attached
the memory usage of the 2nd and 3rd
tests. The memory info is from the
option '-log_summary'. I tried to use
'-momery_info' as you suggested, but
in my case petsc treated it as an
unused option. It output nothing about
the memory. Do I need to add sth to my
code so I can use '-memory_info'?<br>
>>>>>> Sorry, my
mistake the option is -memory_view<br>
>>>>>><br>
>>>>>> Can you
run the one case with -memory_view and
-mg_coarse jacobi -ksp_max_it 1 (just
so it doesn't iterate forever) to see
how much memory is used without the
telescope? Also run case 2 the same
way.<br>
>>>>>><br>
>>>>>> Barry<br>
>>>>>><br>
>>>>>><br>
>>>>>><br>
>>>>>> In both tests
the memory usage is not large.<br>
>>>>>><br>
>>>>>> It seems to
me that it might be the 'telescope'
preconditioner that allocated a lot of
memory and caused the error in the 1st
test.<br>
>>>>>> Is there is a
way to show how much memory it
allocated?<br>
>>>>>><br>
>>>>>> Frank<br>
>>>>>><br>
>>>>>> On 07/05/2016
03:37 PM, Barry Smith wrote:<br>
>>>>>> Frank,<br>
>>>>>><br>
>>>>>> You can
run with -ksp_view_pre to have it
"view" the KSP before the solve so
hopefully it gets that far.<br>
>>>>>><br>
>>>>>> Please
run the problem that does fit with
-memory_info when the problem
completes it will show the "high water
mark" for PETSc allocated memory and
total memory used. We first want to
look at these numbers to see if it is
using more memory than you expect. You
could also run with say half the grid
spacing to see how the memory usage
scaled with the increase in grid
points. Make the runs also with
-log_view and send all the output from
these options.<br>
>>>>>><br>
>>>>>> Barry<br>
>>>>>><br>
>>>>>> On Jul 5,
2016, at 5:23 PM, frank <<a
moz-do-not-send="true">hengjiew@uci.edu</a>>
wrote:<br>
>>>>>><br>
>>>>>> Hi,<br>
>>>>>><br>
>>>>>> I am using
the CG ksp solver and Multigrid
preconditioner to solve a linear
system in parallel.<br>
>>>>>> I chose to
use the 'Telescope' as the
preconditioner on the coarse mesh for
its good performance.<br>
>>>>>> The petsc
options file is attached.<br>
>>>>>><br>
>>>>>> The domain is
a 3d box.<br>
>>>>>> It works well
when the grid is 1536*128*384 and the
process mesh is 96*8*24. When I double
the size of grid and
keep
the same process mesh and petsc
options, I get an "out of memory"
error from the super-cluster I am
using.<br>
>>>>>> Each process
has access to at least 8G memory,
which should be more than enough for
my application. I am sure that all the
other parts of my code( except the
linear solver ) do not use much
memory. So I doubt if there is
something wrong with the linear
solver.<br>
>>>>>> The error
occurs before the linear system is
completely solved so I don't have the
info from ksp view. I am not able to
re-produce the error with a smaller
problem either.<br>
>>>>>> In addition,
I tried to use the block jacobi as the
preconditioner with the same grid and
same decomposition. The linear solver
runs extremely slow but there is no
memory error.<br>
>>>>>><br>
>>>>>> How can I
diagnose what exactly cause the error?<br>
>>>>>> Thank you so
much.<br>
>>>>>><br>
>>>>>> Frank<br>
>>>>>>
<petsc_options.txt><br>
>>>>>>
<ksp_view_pre.txt><memory_test<wbr>2.txt><memory_test3.txt><petsc<wbr>_options.txt><br>
>>>>>><br>
>>>>><br>
>>>><br>
>>>
<ksp_view1.txt><ksp_view2.txt><wbr><ksp_view3.txt><memory1.txt><m<wbr>emory2.txt><petsc_options1.txt<wbr>><petsc_options2.txt><petsc_op<wbr>tions3.txt><br>
><br>
<br>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
<div> </div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>