[petsc-users] Performance of the Telescope Multigrid Preconditioner
frank
hengjiew at uci.edu
Tue Oct 4 15:26:09 CDT 2016
On 10/04/2016 01:20 PM, Matthew Knepley wrote:
> On Tue, Oct 4, 2016 at 3:09 PM, frank <hengjiew at uci.edu
> <mailto:hengjiew at uci.edu>> wrote:
>
> Hi Dave,
>
> Thank you for the reply.
> What do you mean by the "nested calls to KSPSolve"?
>
>
> KSPSolve is called again after redistributing the computation.
I am still confused. There is only one KSPSolve in my code.
Do you mean KSPSolve is called again in the sub-communicator? If that's
the case, even if I put two identical KSPSolve in the code, the
sub-communicator is still going to call KSPSolve, right?
> I tried to call KSPSolve twice, but the the second solve converged
> in 0 iteration. KSPSolve seems to remember the solution. How can I
> force both solves start from the same initial guess?
>
>
> Did you zero the solution vector between solves? VecSet(x, 0.0);
>
> Matt
>
> Thank you.
>
> Frank
>
>
>
> On 10/04/2016 12:56 PM, Dave May wrote:
>>
>>
>> On Tuesday, 4 October 2016, frank <hengjiew at uci.edu
>> <mailto:hengjiew at uci.edu>> wrote:
>>
>> Hi,
>>
>> This question is follow-up of the thread "Question about
>> memory usage in Multigrid preconditioner".
>> I used to have the "Out of Memory(OOM)" problem when using
>> the CG+Telescope MG solver with 32768 cores. Adding the
>> "-matrap 0; -matptap_scalable" option did solve that problem.
>>
>> Then I test the scalability by solving a 3d poisson eqn for 1
>> step. I used one sub-communicator in all the tests. The
>> difference between the petsc options in those tests are: 1
>> the pc_telescope_reduction_factor; 2 the number of multigrid
>> levels in the up/down solver. The function "ksp_solve" is
>> timed. It is kind of slow and doesn't scale at all.
>>
>> Test1: 512^3 grid points
>> Core# telescope_reduction_factor MG levels# for
>> up/down solver Time for KSPSolve (s)
>> 512 8 4 / 3 6.2466
>> 4096 64 5 / 3 0.9361
>> 32768 64 4 / 3 4.8914
>>
>> Test2: 1024^3 grid points
>> Core# telescope_reduction_factor MG levels# for
>> up/down solver Time for KSPSolve (s)
>> 4096 64 5 / 4 3.4139
>> 8192 128 5 / 4 2.4196
>> 16384 32 5 / 3 5.4150
>> 32768 64 5 / 3 5.6067
>> 65536 128 5 / 3 6.5219
>>
>>
>> You have to be very careful how you interpret these numbers. Your
>> solver contains nested calls to KSPSolve, and unfortunately as a
>> result the numbers you report include setup time. This will
>> remain true even if you call KSPSetUp on the outermost KSP.
>>
>> Your email concerns scalability of the silver application, so
>> let's focus on that issue.
>>
>> The only way to clearly separate setup from solve time is
>> to perform two identical solves. The second solve will not
>> require any setup. You should monitor the second solve via a new
>> PetscStage.
>>
>> This was what I did in the telescope paper. It was the only way
>> to understand the setup cost (and scaling) cf the solve time (and
>> scaling).
>>
>> Thanks
>> Dave
>>
>> I guess I didn't set the MG levels properly. What would be
>> the efficient way to arrange the MG levels?
>> Also which preconditionr at the coarse mesh of the 2nd
>> communicator should I use to improve the performance?
>>
>> I attached the test code and the petsc options file for the
>> 1024^3 cube with 32768 cores.
>>
>> Thank you.
>>
>> Regards,
>> Frank
>>
>>
>>
>>
>>
>>
>> On 09/15/2016 03:35 AM, Dave May wrote:
>>> HI all,
>>>
>>> I the only unexpected memory usage I can see is associated
>>> with the call to MatPtAP().
>>> Here is something you can try immediately.
>>> Run your code with the additional options
>>> -matrap 0 -matptap_scalable
>>>
>>> I didn't realize this before, but the default behaviour of
>>> MatPtAP in parallel is actually to to explicitly form the
>>> transpose of P (e.g. assemble R = P^T) and then compute R.A.P.
>>> You don't want to do this. The option -matrap 0 resolves
>>> this issue.
>>>
>>> The implementation of P^T.A.P has two variants.
>>> The scalable implementation (with respect to memory usage)
>>> is selected via the second option -matptap_scalable.
>>>
>>> Try it out - I see a significant memory reduction using
>>> these options for particular mesh sizes / partitions.
>>>
>>> I've attached a cleaned up version of the code you sent me.
>>> There were a number of memory leaks and other issues.
>>> The main points being
>>> * You should call DMDAVecGetArrayF90() before
>>> VecAssembly{Begin,End}
>>> * You should call PetscFinalize(), otherwise the option
>>> -log_summary (-log_view) will not display anything once the
>>> program has completed.
>>>
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>> On 15 September 2016 at 08:03, Hengjie Wang
>>> <hengjiew at uci.edu> wrote:
>>>
>>> Hi Dave,
>>>
>>> Sorry, I should have put more comment to explain the code.
>>> The number of process in each dimension is the same: Px
>>> = Py=Pz=P. So is the domain size.
>>> So if the you want to run the code for a 512^3 grid
>>> points on 16^3 cores, you need to set "-N 512 -P 16" in
>>> the command line.
>>> I add more comments and also fix an error in the
>>> attached code. ( The error only effects the accuracy of
>>> solution but not the memory usage. )
>>>
>>> Thank you.
>>> Frank
>>>
>>>
>>> On 9/14/2016 9:05 PM, Dave May wrote:
>>>>
>>>>
>>>> On Thursday, 15 September 2016, Dave May
>>>> <dave.mayhem23 at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Thursday, 15 September 2016, frank
>>>> <hengjiew at uci.edu> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I write a simple code to re-produce the error.
>>>> I hope this can help to diagnose the problem.
>>>> The code just solves a 3d poisson equation.
>>>>
>>>>
>>>> Why is the stencil width a runtime parameter?? And
>>>> why is the default value 2? For 7-pnt FD Laplace,
>>>> you only need a stencil width of 1.
>>>>
>>>> Was this choice made to mimic something in the
>>>> real application code?
>>>>
>>>>
>>>> Please ignore - I misunderstood your usage of the param
>>>> set by -P
>>>>
>>>>
>>>> I run the code on a 1024^3 mesh. The process
>>>> partition is 32 * 32 * 32. That's when I
>>>> re-produce the OOM error. Each core has about
>>>> 2G memory.
>>>> I also run the code on a 512^3 mesh with 16 *
>>>> 16 * 16 processes. The ksp solver works fine.
>>>> I attached the code, ksp_view_pre's output and
>>>> my petsc option file.
>>>>
>>>> Thank you.
>>>> Frank
>>>>
>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>>>> Hi Barry,
>>>>>
>>>>> I checked. On the supercomputer, I had the
>>>>> option "-ksp_view_pre" but it is not in file I
>>>>> sent you. I am sorry for the confusion.
>>>>>
>>>>> Regards,
>>>>> Frank
>>>>>
>>>>> On Friday, September 9, 2016, Barry Smith
>>>>> <bsmith at mcs.anl.gov> wrote:
>>>>>
>>>>>
>>>>> > On Sep 9, 2016, at 3:11 PM, frank
>>>>> <hengjiew at uci.edu> wrote:
>>>>> >
>>>>> > Hi Barry,
>>>>> >
>>>>> > I think the first KSP view output is
>>>>> from -ksp_view_pre. Before I submitted the
>>>>> test, I was not sure whether there would
>>>>> be OOM error or not. So I added both
>>>>> -ksp_view_pre and -ksp_view.
>>>>>
>>>>> But the options file you sent
>>>>> specifically does NOT list the
>>>>> -ksp_view_pre so how could it be from that?
>>>>>
>>>>> Sorry to be pedantic but I've spent too
>>>>> much time in the past trying to debug from
>>>>> incorrect information and want to make
>>>>> sure that the information I have is
>>>>> correct before thinking. Please recheck
>>>>> exactly what happened. Rerun with the
>>>>> exact input file you emailed if that is
>>>>> needed.
>>>>>
>>>>> Barry
>>>>>
>>>>> >
>>>>> > Frank
>>>>> >
>>>>> >
>>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>>>>> >> Why does ksp_view2.txt have two KSP
>>>>> views in it while ksp_view1.txt has only
>>>>> one KSPView in it? Did you run two
>>>>> different solves in the 2 case but not the
>>>>> one?
>>>>> >>
>>>>> >> Barry
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>> On Sep 9, 2016, at 10:56 AM, frank
>>>>> <hengjiew at uci.edu> wrote:
>>>>> >>>
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> I want to continue digging into the
>>>>> memory problem here.
>>>>> >>> I did find a work around in the past,
>>>>> which is to use less cores per node so
>>>>> that each core has 8G memory. However this
>>>>> is deficient and expensive. I hope to
>>>>> locate the place that uses the most memory.
>>>>> >>>
>>>>> >>> Here is a brief summary of the tests I
>>>>> did in past:
>>>>> >>>> Test1: Mesh 1536*128*384 |
>>>>> Process Mesh 48*4*12
>>>>> >>> Maximum (over computational time)
>>>>> process memory: total 7.0727e+08
>>>>> >>> Current process memory: total
>>>>> 7.0727e+08
>>>>> >>> Maximum (over computational time)
>>>>> space PetscMalloc()ed: total 6.3908e+11
>>>>> >>> Current space PetscMalloc()ed:
>>>>>
>>>>> total 1.8275e+09
>>>>> >>>
>>>>> >>>> Test2: Mesh 1536*128*384 |
>>>>> Process Mesh 96*8*24
>>>>> >>> Maximum (over computational time)
>>>>> process memory: total 5.9431e+09
>>>>> >>> Current process memory: total
>>>>> 5.9431e+09
>>>>> >>> Maximum (over computational time)
>>>>> space PetscMalloc()ed: total 5.3202e+12
>>>>> >>> Current space PetscMalloc()ed:
>>>>>
>>>>> total 5.4844e+09
>>>>> >>>
>>>>> >>>> Test3: Mesh 3072*256*768 |
>>>>> Process Mesh 96*8*24
>>>>> >>> OOM( Out Of Memory ) killer of the
>>>>> supercomputer terminated the job during
>>>>> "KSPSolve".
>>>>> >>>
>>>>> >>> I attached the output of ksp_view( the
>>>>> third test's output is from ksp_view_pre
>>>>> ), memory_view and also the petsc options.
>>>>> >>>
>>>>> >>> In all the tests, each core can access
>>>>> about 2G memory. In test3, there are
>>>>> 4223139840 non-zeros in the matrix. This
>>>>> will consume about 1.74M, using double
>>>>> precision. Considering some extra memory
>>>>> used to store integer index, 2G memory
>>>>> should still be way enough.
>>>>> >>>
>>>>> >>> Is there a way to find out which part
>>>>> of KSPSolve uses the most memory?
>>>>> >>> Thank you so much.
>>>>> >>>
>>>>> >>> BTW, there are 4 options remains
>>>>> unused and I don't understand why they are
>>>>> omitted:
>>>>> >>>
>>>>> -mg_coarse_telescope_mg_coarse_ksp_type
>>>>> value: preonly
>>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type
>>>>> value: bjacobi
>>>>> >>>
>>>>> -mg_coarse_telescope_mg_levels_ksp_max_it
>>>>> value: 1
>>>>> >>>
>>>>> -mg_coarse_telescope_mg_levels_ksp_type
>>>>> value: richardson
>>>>> >>>
>>>>> >>>
>>>>> >>> Regards,
>>>>> >>> Frank
>>>>> >>>
>>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote:
>>>>> >>>>
>>>>> >>>> On 14 July 2016 at 01:07, frank
>>>>> <hengjiew at uci.edu> wrote:
>>>>> >>>> Hi Dave,
>>>>> >>>>
>>>>> >>>> Sorry for the late reply.
>>>>> >>>> Thank you so much for your detailed
>>>>> reply.
>>>>> >>>>
>>>>> >>>> I have a question about the
>>>>> estimation of the memory usage. There are
>>>>> 4223139840 allocated non-zeros and 18432
>>>>> MPI processes. Double precision is used.
>>>>> So the memory per process is:
>>>>> >>>> 4223139840 * 8bytes / 18432 / 1024
>>>>> / 1024 = 1.74M ?
>>>>> >>>> Did I do sth wrong here? Because this
>>>>> seems too small.
>>>>> >>>>
>>>>> >>>> No - I totally f***ed it up. You are
>>>>> correct. That'll teach me for fumbling
>>>>> around with my iphone calculator and not
>>>>> using my brain. (Note that to convert to
>>>>> MB just divide by 1e6, not 1024^2 -
>>>>> although I apparently cannot convert
>>>>> between units correctly....)
>>>>> >>>>
>>>>> >>>> From the PETSc objects associated
>>>>> with the solver, It looks like it _should_
>>>>> run with 2GB per MPI rank. Sorry for my
>>>>> mistake. Possibilities are: somewhere in
>>>>> your usage of PETSc you've introduced a
>>>>> memory leak; PETSc is doing a huge over
>>>>> allocation (e.g. as per our discussion of
>>>>> MatPtAP); or in your application code
>>>>> there are other objects you have forgotten
>>>>> to log the memory for.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> I am running this job on Bluewater
>>>>> >>>> I am using the 7 points FD stencil in 3D.
>>>>> >>>>
>>>>> >>>> I thought so on both counts.
>>>>> >>>>
>>>>> >>>> I apologize that I made a stupid
>>>>> mistake in computing the memory per core.
>>>>> My settings render each core can access
>>>>> only 2G memory on average instead of 8G
>>>>> which I mentioned in previous email. I
>>>>> re-run the job with 8G memory per core on
>>>>> average and there is no "Out Of Memory"
>>>>> error. I would do more test to see if
>>>>> there is still some memory issue.
>>>>> >>>>
>>>>> >>>> Ok. I'd still like to know where the
>>>>> memory was being used since my estimates
>>>>> were off.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> Thanks,
>>>>> >>>> Dave
>>>>> >>>>
>>>>> >>>> Regards,
>>>>> >>>> Frank
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote:
>>>>> >>>>> Hi Frank,
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On 11 July 2016 at 19:14, frank
>>>>> <hengjiew at uci.edu> wrote:
>>>>> >>>>> Hi Dave,
>>>>> >>>>>
>>>>> >>>>> I re-run the test using bjacobi as
>>>>> the preconditioner on the coarse mesh of
>>>>> telescope. The Grid is 3072*256*768 and
>>>>> process mesh is 96*8*24. The petsc option
>>>>> file is attached.
>>>>> >>>>> I still got the "Out Of Memory"
>>>>> error. The error occurred before the
>>>>> linear solver finished one step. So I
>>>>> don't have the full info from ksp_view.
>>>>> The info from ksp_view_pre is attached.
>>>>> >>>>>
>>>>> >>>>> Okay - that is essentially useless
>>>>> (sorry)
>>>>> >>>>>
>>>>> >>>>> It seems to me that the error
>>>>> occurred when the decomposition was going
>>>>> to be changed.
>>>>> >>>>>
>>>>> >>>>> Based on what information?
>>>>> >>>>> Running with -info would give us
>>>>> more clues, but will create a ton of output.
>>>>> >>>>> Please try running the case which
>>>>> failed with -info
>>>>> >>>>> I had another test with a grid of
>>>>> 1536*128*384 and the same process mesh as
>>>>> above. There was no error. The ksp_view
>>>>> info is attached for comparison.
>>>>> >>>>> Thank you.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> [3] Here is my crude estimate of
>>>>> your memory usage.
>>>>> >>>>> I'll target the biggest memory hogs
>>>>> only to get an order of magnitude estimate
>>>>> >>>>>
>>>>> >>>>> * The Fine grid operator contains
>>>>> 4223139840 non-zeros --> 1.8 GB per MPI
>>>>> rank assuming double precision.
>>>>> >>>>> The indices for the AIJ could amount
>>>>> to another 0.3 GB (assuming 32 bit integers)
>>>>> >>>>>
>>>>> >>>>> * You use 5 levels of coarsening, so
>>>>> the other operators should represent
>>>>> (collectively)
>>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 +
>>>>> 2.1/8^4 ~ 300 MB per MPI rank on the
>>>>> communicator with 18432 ranks.
>>>>> >>>>> The coarse grid should consume ~ 0.5
>>>>> MB per MPI rank on the communicator with
>>>>> 18432 ranks.
>>>>> >>>>>
>>>>> >>>>> * You use a reduction factor of 64,
>>>>> making the new communicator with 288 MPI
>>>>> ranks.
>>>>> >>>>> PCTelescope will first gather a
>>>>> temporary matrix associated with your
>>>>> coarse level operator assuming a comm size
>>>>> of 288 living on the comm with size 18432.
>>>>> >>>>> This matrix will require
>>>>> approximately 0.5 * 64 = 32 MB per core on
>>>>> the 288 ranks.
>>>>> >>>>> This matrix is then used to form a
>>>>> new MPIAIJ matrix on the subcomm, thus
>>>>> require another 32 MB per rank.
>>>>> >>>>> The temporary matrix is now destroyed.
>>>>> >>>>>
>>>>> >>>>> * Because a DMDA is detected, a
>>>>> permutation matrix is assembled.
>>>>> >>>>> This requires 2 doubles per point in
>>>>> the DMDA.
>>>>> >>>>> Your coarse DMDA contains 92 x 16 x
>>>>> 48 points.
>>>>> >>>>> Thus the permutation matrix will
>>>>> require < 1 MB per MPI rank on the sub-comm.
>>>>> >>>>>
>>>>> >>>>> * Lastly, the matrix is permuted.
>>>>> This uses MatPtAP(), but the resulting
>>>>> operator will have the same memory
>>>>> footprint as the unpermuted matrix (32
>>>>> MB). At any stage in PCTelescope, only 2
>>>>> operators of size 32 MB are held in memory
>>>>> when the DMDA is provided.
>>>>> >>>>>
>>>>> >>>>> From my rough estimates, the worst
>>>>> case memory foot print for any given core,
>>>>> given your options is approximately
>>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1
>>>>> MB = 2465 MB
>>>>> >>>>> This is way below 8 GB.
>>>>> >>>>>
>>>>> >>>>> Note this estimate completely ignores:
>>>>> >>>>> (1) the memory required for the
>>>>> restriction operator,
>>>>> >>>>> (2) the potential growth in the
>>>>> number of non-zeros per row due to
>>>>> Galerkin coarsening (I wished
>>>>> -ksp_view_pre reported the output from
>>>>> MatView so we could see the number of
>>>>> non-zeros required by the coarse level
>>>>> operators)
>>>>> >>>>> (3) all temporary vectors required
>>>>> by the CG solver, and those required by
>>>>> the smoothers.
>>>>> >>>>> (4) internal memory allocated by MatPtAP
>>>>> >>>>> (5) memory associated with IS's used
>>>>> within PCTelescope
>>>>> >>>>>
>>>>> >>>>> So either I am completely off in my
>>>>> estimates, or you have not carefully
>>>>> estimated the memory usage of your
>>>>> application code. Hopefully others might
>>>>> examine/correct my rough estimates
>>>>> >>>>>
>>>>> >>>>> Since I don't have your code I
>>>>> cannot access the latter.
>>>>> >>>>> Since I don't have access to the
>>>>> same machine you are running on, I think
>>>>> we need to take a step back.
>>>>> >>>>>
>>>>> >>>>> [1] What machine are you running on?
>>>>> Send me a URL if its available
>>>>> >>>>>
>>>>> >>>>> [2] What discretization are you
>>>>> using? (I am guessing a scalar 7 point FD
>>>>> stencil)
>>>>> >>>>> If it's a 7 point FD stencil, we
>>>>> should be able to examine the memory usage
>>>>> of your solver configuration using a
>>>>> standard, light weight existing PETSc
>>>>> example, run on your machine at the same
>>>>> scale.
>>>>> >>>>> This would hopefully enable us to
>>>>> correctly evaluate the actual memory usage
>>>>> required by the solver configuration you
>>>>> are using.
>>>>> >>>>>
>>>>> >>>>> Thanks,
>>>>> >>>>> Dave
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Frank
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote:
>>>>> >>>>>>
>>>>> >>>>>> On Saturday, 9 July 2016, frank
>>>>> <hengjiew at uci.edu> wrote:
>>>>> >>>>>> Hi Barry and Dave,
>>>>> >>>>>>
>>>>> >>>>>> Thank both of you for the advice.
>>>>> >>>>>>
>>>>> >>>>>> @Barry
>>>>> >>>>>> I made a mistake in the file names
>>>>> in last email. I attached the correct
>>>>> files this time.
>>>>> >>>>>> For all the three tests,
>>>>> 'Telescope' is used as the coarse
>>>>> preconditioner.
>>>>> >>>>>>
>>>>> >>>>>> == Test1: Grid: 1536*128*384,
>>>>> Process Mesh: 48*4*12
>>>>> >>>>>> Part of the memory usage: Vector
>>>>> 125 124 3971904 0.
>>>>> >>>>>> Matrix 101 101 9462372 0
>>>>> >>>>>>
>>>>> >>>>>> == Test2: Grid: 1536*128*384,
>>>>> Process Mesh: 96*8*24
>>>>> >>>>>> Part of the memory usage: Vector
>>>>> 125 124 681672 0.
>>>>> >>>>>> Matrix 101 101 1462180 0.
>>>>> >>>>>>
>>>>> >>>>>> In theory, the memory usage in
>>>>> Test1 should be 8 times of Test2. In my
>>>>> case, it is about 6 times.
>>>>> >>>>>>
>>>>> >>>>>> == Test3: Grid: 3072*256*768,
>>>>> Process Mesh: 96*8*24. Sub-domain per
>>>>> process: 32*32*32
>>>>> >>>>>> Here I get the out of memory error.
>>>>> >>>>>>
>>>>> >>>>>> I tried to use -mg_coarse jacobi.
>>>>> In this way, I don't need to set
>>>>> -mg_coarse_ksp_type and -mg_coarse_pc_type
>>>>> explicitly, right?
>>>>> >>>>>> The linear solver didn't work in
>>>>> this case. Petsc output some errors.
>>>>> >>>>>>
>>>>> >>>>>> @Dave
>>>>> >>>>>> In test3, I use only one instance
>>>>> of 'Telescope'. On the coarse mesh of
>>>>> 'Telescope', I used LU as the
>>>>> preconditioner instead of SVD.
>>>>> >>>>>> If my set the levels correctly,
>>>>> then on the last coarse mesh of MG where
>>>>> it calls 'Telescope', the sub-domain per
>>>>> process is 2*2*2.
>>>>> >>>>>> On the last coarse mesh of
>>>>> 'Telescope', there is only one grid point
>>>>> per process.
>>>>> >>>>>> I still got the OOM error. The
>>>>> detailed petsc option file is attached.
>>>>> >>>>>>
>>>>> >>>>>> Do you understand the expected
>>>>> memory usage for the particular parallel
>>>>> LU implementation you are using? I don't
>>>>> (seriously). Replace LU with bjacobi and
>>>>> re-run this test. My point about solver
>>>>> debugging is still valid.
>>>>> >>>>>>
>>>>> >>>>>> And please send the result of
>>>>> KSPView so we can see what is actually
>>>>> used in the computations
>>>>> >>>>>>
>>>>> >>>>>> Thanks
>>>>> >>>>>> Dave
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Thank you so much.
>>>>> >>>>>>
>>>>> >>>>>> Frank
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith
>>>>> wrote:
>>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank
>>>>> <hengjiew at uci.edu> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hi Barry,
>>>>> >>>>>>
>>>>> >>>>>> Thank you for you advice.
>>>>> >>>>>> I tried three test. In the 1st
>>>>> test, the grid is 3072*256*768 and the
>>>>> process mesh is 96*8*24.
>>>>> >>>>>> The linear solver is 'cg' the
>>>>> preconditioner is 'mg' and 'telescope' is
>>>>> used as the preconditioner at the coarse mesh.
>>>>> >>>>>> The system gives me the "Out of
>>>>> Memory" error before the linear system is
>>>>> completely solved.
>>>>> >>>>>> The info from '-ksp_view_pre' is
>>>>> attached. I seems to me that the error
>>>>> occurs when it reaches the coarse mesh.
>>>>> >>>>>>
>>>>> >>>>>> The 2nd test uses a grid of
>>>>> 1536*128*384 and process mesh is 96*8*24.
>>>>> The 3rd test uses the same grid
>>>>> but a different process mesh 48*4*12.
>>>>> >>>>>> Are you sure this is right? The
>>>>> total matrix and vector memory usage goes
>>>>> from 2nd test
>>>>> >>>>>> Vector 384
>>>>> 383 8,193,712 0.
>>>>> >>>>>> Matrix 103
>>>>> 103 11,508,688 0.
>>>>> >>>>>> to 3rd test
>>>>> >>>>>> Vector 384
>>>>> 383 1,590,520 0.
>>>>> >>>>>> Matrix 103
>>>>> 103 3,508,664 0.
>>>>> >>>>>> that is the memory usage got
>>>>> smaller but if you have only 1/8th the
>>>>> processes and the same grid it should have
>>>>> gotten about 8 times bigger. Did you maybe
>>>>> cut the grid by a factor of 8 also? If so
>>>>> that still doesn't explain it because the
>>>>> memory usage changed by a factor of 5
>>>>> something for the vectors and 3 something
>>>>> for the matrices.
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> The linear solver and petsc options
>>>>> in 2nd and 3rd tests are the same in 1st
>>>>> test. The linear solver works fine in both
>>>>> test.
>>>>> >>>>>> I attached the memory usage of the
>>>>> 2nd and 3rd tests. The memory info is from
>>>>> the option '-log_summary'. I tried to use
>>>>> '-momery_info' as you suggested, but in my
>>>>> case petsc treated it as an unused option.
>>>>> It output nothing about the memory. Do I
>>>>> need to add sth to my code so I can use
>>>>> '-memory_info'?
>>>>> >>>>>> Sorry, my mistake the option is
>>>>> -memory_view
>>>>> >>>>>>
>>>>> >>>>>> Can you run the one case with
>>>>> -memory_view and -mg_coarse jacobi
>>>>> -ksp_max_it 1 (just so it doesn't iterate
>>>>> forever) to see how much memory is used
>>>>> without the telescope? Also run case 2 the
>>>>> same way.
>>>>> >>>>>>
>>>>> >>>>>> Barry
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> In both tests the memory usage is
>>>>> not large.
>>>>> >>>>>>
>>>>> >>>>>> It seems to me that it might be the
>>>>> 'telescope' preconditioner that allocated
>>>>> a lot of memory and caused the error in
>>>>> the 1st test.
>>>>> >>>>>> Is there is a way to show how much
>>>>> memory it allocated?
>>>>> >>>>>>
>>>>> >>>>>> Frank
>>>>> >>>>>>
>>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith
>>>>> wrote:
>>>>> >>>>>> Frank,
>>>>> >>>>>>
>>>>> >>>>>> You can run with -ksp_view_pre
>>>>> to have it "view" the KSP before the solve
>>>>> so hopefully it gets that far.
>>>>> >>>>>>
>>>>> >>>>>> Please run the problem that
>>>>> does fit with -memory_info when the
>>>>> problem completes it will show the "high
>>>>> water mark" for PETSc allocated memory and
>>>>> total memory used. We first want to look
>>>>> at these numbers to see if it is using
>>>>> more memory than you expect. You could
>>>>> also run with say half the grid spacing to
>>>>> see how the memory usage scaled with the
>>>>> increase in grid points. Make the runs
>>>>> also with -log_view and send all the
>>>>> output from these options.
>>>>> >>>>>>
>>>>> >>>>>> Barry
>>>>> >>>>>>
>>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank
>>>>> <hengjiew at uci.edu> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hi,
>>>>> >>>>>>
>>>>> >>>>>> I am using the CG ksp solver and
>>>>> Multigrid preconditioner to solve a linear
>>>>> system in parallel.
>>>>> >>>>>> I chose to use the 'Telescope' as
>>>>> the preconditioner on the coarse mesh for
>>>>> its good performance.
>>>>> >>>>>> The petsc options file is attached.
>>>>> >>>>>>
>>>>> >>>>>> The domain is a 3d box.
>>>>> >>>>>> It works well when the grid is
>>>>> 1536*128*384 and the process mesh is
>>>>> 96*8*24. When I double the size of grid
>>>>> and keep the same process mesh and petsc
>>>>> options, I get an "out of memory" error
>>>>> from the super-cluster I am using.
>>>>> >>>>>> Each process has access to at least
>>>>> 8G memory, which should be more than
>>>>> enough for my application. I am sure that
>>>>> all the other parts of my code( except the
>>>>> linear solver ) do not use much memory. So
>>>>> I doubt if there is something wrong with
>>>>> the linear solver.
>>>>> >>>>>> The error occurs before the linear
>>>>> system is completely solved so I don't
>>>>> have the info from ksp view. I am not able
>>>>> to re-produce the error with a smaller
>>>>> problem either.
>>>>> >>>>>> In addition, I tried to use the
>>>>> block jacobi as the preconditioner with
>>>>> the same grid and same decomposition. The
>>>>> linear solver runs extremely slow but
>>>>> there is no memory error.
>>>>> >>>>>>
>>>>> >>>>>> How can I diagnose what exactly
>>>>> cause the error?
>>>>> >>>>>> Thank you so much.
>>>>> >>>>>>
>>>>> >>>>>> Frank
>>>>> >>>>>> <petsc_options.txt>
>>>>> >>>>>>
>>>>> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>
>>>>> <ksp_view1.txt><ksp_view2.txt><ksp_view3.txt><memory1.txt><memory2.txt><petsc_options1.txt><petsc_options2.txt><petsc_options3.txt>
>>>>> >
>>>>>
>>>>
>>>
>>>
>>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20161004/3a5aa3f6/attachment-0001.html>
More information about the petsc-users
mailing list