[petsc-users] Question about memory usage in Multigrid preconditioner
frank
hengjiew at uci.edu
Fri Sep 9 10:56:35 CDT 2016
Hi,
I want to continue digging into the memory problem here.
I did find a work around in the past, which is to use less cores per
node so that each core has 8G memory. However this is deficient and
expensive. I hope to locate the place that uses the most memory.
Here is a brief summary of the tests I did in past:
> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12
Maximum (over computational time) process memory: total
7.0727e+08
Current process memory: total 7.0727e+08
Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11
Current space PetscMalloc()ed: total 1.8275e+09
> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24
Maximum (over computational time) process memory: total
5.9431e+09
Current process memory: total 5.9431e+09
Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12
Current space PetscMalloc()ed: total 5.4844e+09
> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24
OOM( Out Of Memory ) killer of the supercomputer terminated the job
during "KSPSolve".
I attached the output of ksp_view( the third test's output is from
ksp_view_pre ), memory_view and also the petsc options.
In all the tests, each core can access about 2G memory. In test3, there
are 4223139840 non-zeros in the matrix. This will consume about 1.74M,
using double precision. Considering some extra memory used to store
integer index, 2G memory should still be way enough.
Is there a way to find out which part of KSPSolve uses the most memory?
Thank you so much.
BTW, there are 4 options remains unused and I don't understand why they
are omitted:
-mg_coarse_telescope_mg_coarse_ksp_type value: preonly
-mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
-mg_coarse_telescope_mg_levels_ksp_max_it value: 1
-mg_coarse_telescope_mg_levels_ksp_type value: richardson
Regards,
Frank
On 07/13/2016 05:47 PM, Dave May wrote:
>
>
> On 14 July 2016 at 01:07, frank <hengjiew at uci.edu
> <mailto:hengjiew at uci.edu>> wrote:
>
> Hi Dave,
>
> Sorry for the late reply.
> Thank you so much for your detailed reply.
>
> I have a question about the estimation of the memory usage. There
> are 4223139840 allocated non-zeros and 18432 MPI processes. Double
> precision is used. So the memory per process is:
> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
> Did I do sth wrong here? Because this seems too small.
>
>
> No - I totally f***ed it up. You are correct. That'll teach me for
> fumbling around with my iphone calculator and not using my brain.
> (Note that to convert to MB just divide by 1e6, not 1024^2 - although
> I apparently cannot convert between units correctly....)
>
> From the PETSc objects associated with the solver, It looks like it
> _should_ run with 2GB per MPI rank. Sorry for my mistake.
> Possibilities are: somewhere in your usage of PETSc you've introduced
> a memory leak; PETSc is doing a huge over allocation (e.g. as per our
> discussion of MatPtAP); or in your application code there are other
> objects you have forgotten to log the memory for.
>
>
>
> I am running this job on Bluewater
> <https://bluewaters.ncsa.illinois.edu/user-guide>
>
> I am using the 7 points FD stencil in 3D.
>
>
> I thought so on both counts.
>
>
> I apologize that I made a stupid mistake in computing the memory
> per core. My settings render each core can access only 2G memory
> on average instead of 8G which I mentioned in previous email. I
> re-run the job with 8G memory per core on average and there is no
> "Out Of Memory" error. I would do more test to see if there is
> still some memory issue.
>
>
> Ok. I'd still like to know where the memory was being used since my
> estimates were off.
>
>
> Thanks,
> Dave
>
>
> Regards,
> Frank
>
>
>
> On 07/11/2016 01:18 PM, Dave May wrote:
>> Hi Frank,
>>
>>
>> On 11 July 2016 at 19:14, frank <hengjiew at uci.edu
>> <mailto:hengjiew at uci.edu>> wrote:
>>
>> Hi Dave,
>>
>> I re-run the test using bjacobi as the preconditioner on the
>> coarse mesh of telescope. The Grid is 3072*256*768 and
>> process mesh is 96*8*24. The petsc option file is attached.
>> I still got the "Out Of Memory" error. The error occurred
>> before the linear solver finished one step. So I don't have
>> the full info from ksp_view. The info from ksp_view_pre is
>> attached.
>>
>>
>> Okay - that is essentially useless (sorry)
>>
>>
>> It seems to me that the error occurred when the decomposition
>> was going to be changed.
>>
>>
>> Based on what information?
>> Running with -info would give us more clues, but will create a
>> ton of output.
>> Please try running the case which failed with -info
>>
>> I had another test with a grid of 1536*128*384 and the same
>> process mesh as above. There was no error. The ksp_view info
>> is attached for comparison.
>> Thank you.
>>
>>
>>
>> [3] Here is my crude estimate of your memory usage.
>> I'll target the biggest memory hogs only to get an order of
>> magnitude estimate
>>
>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB
>> per MPI rank assuming double precision.
>> The indices for the AIJ could amount to another 0.3 GB (assuming
>> 32 bit integers)
>>
>> * You use 5 levels of coarsening, so the other operators should
>> represent (collectively)
>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on
>> the communicator with 18432 ranks.
>> The coarse grid should consume ~ 0.5 MB per MPI rank on the
>> communicator with 18432 ranks.
>>
>> * You use a reduction factor of 64, making the new communicator
>> with 288 MPI ranks.
>> PCTelescope will first gather a temporary matrix associated with
>> your coarse level operator assuming a comm size of 288 living on
>> the comm with size 18432.
>> This matrix will require approximately 0.5 * 64 = 32 MB per core
>> on the 288 ranks.
>> This matrix is then used to form a new MPIAIJ matrix on the
>> subcomm, thus require another 32 MB per rank.
>> The temporary matrix is now destroyed.
>>
>> * Because a DMDA is detected, a permutation matrix is assembled.
>> This requires 2 doubles per point in the DMDA.
>> Your coarse DMDA contains 92 x 16 x 48 points.
>> Thus the permutation matrix will require < 1 MB per MPI rank on
>> the sub-comm.
>>
>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the
>> resulting operator will have the same memory footprint as the
>> unpermuted matrix (32 MB). At any stage in PCTelescope, only 2
>> operators of size 32 MB are held in memory when the DMDA is provided.
>>
>> From my rough estimates, the worst case memory foot print for any
>> given core, given your options is approximately
>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB
>> This is way below 8 GB.
>>
>> Note this estimate completely ignores:
>> (1) the memory required for the restriction operator,
>> (2) the potential growth in the number of non-zeros per row due
>> to Galerkin coarsening (I wished -ksp_view_pre reported the
>> output from MatView so we could see the number of non-zeros
>> required by the coarse level operators)
>> (3) all temporary vectors required by the CG solver, and those
>> required by the smoothers.
>> (4) internal memory allocated by MatPtAP
>> (5) memory associated with IS's used within PCTelescope
>>
>> So either I am completely off in my estimates, or you have not
>> carefully estimated the memory usage of your application code.
>> Hopefully others might examine/correct my rough estimates
>>
>> Since I don't have your code I cannot access the latter.
>> Since I don't have access to the same machine you are running on,
>> I think we need to take a step back.
>>
>> [1] What machine are you running on? Send me a URL if its available
>>
>> [2] What discretization are you using? (I am guessing a scalar 7
>> point FD stencil)
>> If it's a 7 point FD stencil, we should be able to examine the
>> memory usage of your solver configuration using a standard, light
>> weight existing PETSc example, run on your machine at the same
>> scale.
>> This would hopefully enable us to correctly evaluate the actual
>> memory usage required by the solver configuration you are using.
>>
>> Thanks,
>> Dave
>>
>>
>>
>> Frank
>>
>>
>>
>>
>> On 07/08/2016 10:38 PM, Dave May wrote:
>>>
>>>
>>> On Saturday, 9 July 2016, frank <hengjiew at uci.edu
>>> <mailto:hengjiew at uci.edu>> wrote:
>>>
>>> Hi Barry and Dave,
>>>
>>> Thank both of you for the advice.
>>>
>>> @Barry
>>> I made a mistake in the file names in last email. I
>>> attached the correct files this time.
>>> For all the three tests, 'Telescope' is used as the
>>> coarse preconditioner.
>>>
>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12
>>> Part of the memory usage: Vector 125 124
>>> 3971904 0.
>>> Matrix 101 101 9462372 0
>>>
>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24
>>> Part of the memory usage: Vector 125 124
>>> 681672 0.
>>> Matrix 101 101 1462180 0.
>>>
>>> In theory, the memory usage in Test1 should be 8 times
>>> of Test2. In my case, it is about 6 times.
>>>
>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24.
>>> Sub-domain per process: 32*32*32
>>> Here I get the out of memory error.
>>>
>>> I tried to use -mg_coarse jacobi. In this way, I don't
>>> need to set -mg_coarse_ksp_type and -mg_coarse_pc_type
>>> explicitly, right?
>>> The linear solver didn't work in this case. Petsc output
>>> some errors.
>>>
>>> @Dave
>>> In test3, I use only one instance of 'Telescope'. On the
>>> coarse mesh of 'Telescope', I used LU as the
>>> preconditioner instead of SVD.
>>> If my set the levels correctly, then on the last coarse
>>> mesh of MG where it calls 'Telescope', the sub-domain
>>> per process is 2*2*2.
>>> On the last coarse mesh of 'Telescope', there is only
>>> one grid point per process.
>>> I still got the OOM error. The detailed petsc option
>>> file is attached.
>>>
>>>
>>> Do you understand the expected memory usage for the
>>> particular parallel LU implementation you are using? I don't
>>> (seriously). Replace LU with bjacobi and re-run this
>>> test. My point about solver debugging is still valid.
>>>
>>> And please send the result of KSPView so we can see what is
>>> actually used in the computations
>>>
>>> Thanks
>>> Dave
>>>
>>>
>>>
>>> Thank you so much.
>>>
>>> Frank
>>>
>>>
>>>
>>> On 07/06/2016 02:51 PM, Barry Smith wrote:
>>>
>>> On Jul 6, 2016, at 4:19 PM, frank
>>> <hengjiew at uci.edu <mailto:hengjiew at uci.edu>> wrote:
>>>
>>> Hi Barry,
>>>
>>> Thank you for you advice.
>>> I tried three test. In the 1st test, the grid is
>>> 3072*256*768 and the process mesh is 96*8*24.
>>> The linear solver is 'cg' the preconditioner is
>>> 'mg' and 'telescope' is used as the
>>> preconditioner at the coarse mesh.
>>> The system gives me the "Out of Memory" error
>>> before the linear system is completely solved.
>>> The info from '-ksp_view_pre' is attached. I
>>> seems to me that the error occurs when it
>>> reaches the coarse mesh.
>>>
>>> The 2nd test uses a grid of 1536*128*384 and
>>> process mesh is 96*8*24. The 3rd test uses the
>>> same grid but a different process mesh 48*4*12.
>>>
>>> Are you sure this is right? The total matrix and
>>> vector memory usage goes from 2nd test
>>> Vector 384 383
>>> 8,193,712 0.
>>> Matrix 103 103
>>> 11,508,688 0.
>>> to 3rd test
>>> Vector 384 383
>>> 1,590,520 0.
>>> Matrix 103 103
>>> 3,508,664 0.
>>> that is the memory usage got smaller but if you have
>>> only 1/8th the processes and the same grid it should
>>> have gotten about 8 times bigger. Did you maybe cut
>>> the grid by a factor of 8 also? If so that still
>>> doesn't explain it because the memory usage changed
>>> by a factor of 5 something for the vectors and 3
>>> something for the matrices.
>>>
>>>
>>> The linear solver and petsc options in 2nd and
>>> 3rd tests are the same in 1st test. The linear
>>> solver works fine in both test.
>>> I attached the memory usage of the 2nd and 3rd
>>> tests. The memory info is from the option
>>> '-log_summary'. I tried to use '-momery_info' as
>>> you suggested, but in my case petsc treated it
>>> as an unused option. It output nothing about the
>>> memory. Do I need to add sth to my code so I can
>>> use '-memory_info'?
>>>
>>> Sorry, my mistake the option is -memory_view
>>>
>>> Can you run the one case with -memory_view and
>>> -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't
>>> iterate forever) to see how much memory is used
>>> without the telescope? Also run case 2 the same way.
>>>
>>> Barry
>>>
>>>
>>>
>>> In both tests the memory usage is not large.
>>>
>>> It seems to me that it might be the 'telescope'
>>> preconditioner that allocated a lot of memory
>>> and caused the error in the 1st test.
>>> Is there is a way to show how much memory it
>>> allocated?
>>>
>>> Frank
>>>
>>> On 07/05/2016 03:37 PM, Barry Smith wrote:
>>>
>>> Frank,
>>>
>>> You can run with -ksp_view_pre to have
>>> it "view" the KSP before the solve so
>>> hopefully it gets that far.
>>>
>>> Please run the problem that does fit
>>> with -memory_info when the problem completes
>>> it will show the "high water mark" for PETSc
>>> allocated memory and total memory used. We
>>> first want to look at these numbers to see
>>> if it is using more memory than you expect.
>>> You could also run with say half the grid
>>> spacing to see how the memory usage scaled
>>> with the increase in grid points. Make the
>>> runs also with -log_view and send all the
>>> output from these options.
>>>
>>> Barry
>>>
>>> On Jul 5, 2016, at 5:23 PM, frank
>>> <hengjiew at uci.edu
>>> <mailto:hengjiew at uci.edu>> wrote:
>>>
>>> Hi,
>>>
>>> I am using the CG ksp solver and
>>> Multigrid preconditioner to solve a
>>> linear system in parallel.
>>> I chose to use the 'Telescope' as the
>>> preconditioner on the coarse mesh for
>>> its good performance.
>>> The petsc options file is attached.
>>>
>>> The domain is a 3d box.
>>> It works well when the grid is
>>> 1536*128*384 and the process mesh is
>>> 96*8*24. When I double the size of grid
>>> and keep the same process mesh and petsc
>>> options, I get an "out of memory" error
>>> from the super-cluster I am using.
>>> Each process has access to at least 8G
>>> memory, which should be more than enough
>>> for my application. I am sure that all
>>> the other parts of my code( except the
>>> linear solver ) do not use much memory.
>>> So I doubt if there is something wrong
>>> with the linear solver.
>>> The error occurs before the linear
>>> system is completely solved so I don't
>>> have the info from ksp view. I am not
>>> able to re-produce the error with a
>>> smaller problem either.
>>> In addition, I tried to use the block
>>> jacobi as the preconditioner with the
>>> same grid and same decomposition. The
>>> linear solver runs extremely slow but
>>> there is no memory error.
>>>
>>> How can I diagnose what exactly cause
>>> the error?
>>> Thank you so much.
>>>
>>> Frank
>>> <petsc_options.txt>
>>>
>>> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160909/25db6fdb/attachment-0001.html>
-------------- next part --------------
Linear solve converged due to CONVERGED_ATOL iterations 0
KSP Object: 2304 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-07, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 2304 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=4 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 2304 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 2304 MPI processes
type: telescope
Telescope: parent comm size reduction factor = 64
Telescope: comm_size = 2304 , subcomm_size = 36
Telescope: subcomm type: interlaced
Telescope: DMDA detected
DMDA Object: (mg_coarse_telescope_repart_) 36 MPI processes
M 192 N 16 P 48 m 12 n 1 p 3 dof 1 overlap 1
KSP Object: (mg_coarse_telescope_) 36 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using DEFAULT norm type for convergence test
PC Object: (mg_coarse_telescope_) 36 MPI processes
type: mg
PC has not been set up so information may be incomplete
MG: type is MULTIPLICATIVE, levels=3 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_telescope_mg_coarse_) 36 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using DEFAULT norm type for convergence test
PC Object: (mg_coarse_telescope_mg_coarse_) 36 MPI processes
type: redundant
PC has not been set up so information may be incomplete
Redundant preconditioner: Not yet setup
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_1_) 36 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_1_) 36 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_2_) 36 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_2_) 36 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 36 MPI processes
type: mpiaij
rows=147456, cols=147456
total: nonzeros=1013760, allocated nonzeros=1013760
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
linear system matrix = precond matrix:
Mat Object: 2304 MPI processes
type: mpiaij
rows=147456, cols=147456
total: nonzeros=1013760, allocated nonzeros=1013760
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 2304 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 2304 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 2304 MPI processes
type: mpiaij
rows=1179648, cols=1179648
total: nonzeros=8183808, allocated nonzeros=8183808
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 2304 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 2304 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 2304 MPI processes
type: mpiaij
rows=9437184, cols=9437184
total: nonzeros=65765376, allocated nonzeros=65765376
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 2304 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 2304 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 2304 MPI processes
type: mpiaij
rows=75497472, cols=75497472
total: nonzeros=527302656, allocated nonzeros=527302656
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 2304 MPI processes
type: mpiaij
rows=75497472, cols=75497472
total: nonzeros=527302656, allocated nonzeros=527302656
total number of mallocs used during MatSetValues calls =0
has attached null space
-------------- next part --------------
KSP Object: 18432 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-07, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
type: mg
PC has not been set up so information may be incomplete
MG: type is MULTIPLICATIVE, levels=3 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 18432 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using DEFAULT norm type for convergence test
PC Object: (mg_coarse_) 18432 MPI processes
type: redundant
PC has not been set up so information may be incomplete
Redundant preconditioner: Not yet setup
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 18432 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 18432 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 18432 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 18432 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=75497472, cols=75497472
total: nonzeros=527302656, allocated nonzeros=527302656
total number of mallocs used during MatSetValues calls =0
has attached null space
Linear solve converged due to CONVERGED_RTOL iterations 131
KSP Object: 18432 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-07, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=3 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 18432 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 18432 MPI processes
type: telescope
Telescope: parent comm size reduction factor = 64
Telescope: comm_size = 18432 , subcomm_size = 288
Telescope: subcomm type: interlaced
Telescope: DMDA detected
DMDA Object: (mg_coarse_telescope_repart_) 288 MPI processes
M 384 N 32 P 96 m 24 n 2 p 6 dof 1 overlap 1
KSP Object: (mg_coarse_telescope_) 288 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_) 288 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=3 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_telescope_mg_coarse_) 288 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_coarse_) 288 MPI processes
type: bjacobi
block Jacobi: number of blocks = 288
Local solve is same for all blocks, in the following KSP and PC objects:
linear system matrix = precond matrix:
Mat Object: 288 MPI processes
type: mpiaij
rows=18432, cols=18432
total: nonzeros=124416, allocated nonzeros=124416
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_1_) 288 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_1_) 288 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 288 MPI processes
type: mpiaij
rows=147456, cols=147456
total: nonzeros=1013760, allocated nonzeros=1013760
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_2_) 288 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_2_) 288 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 288 MPI processes
type: mpiaij
rows=1179648, cols=1179648
total: nonzeros=8183808, allocated nonzeros=8183808
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 288 MPI processes
type: mpiaij
rows=1179648, cols=1179648
total: nonzeros=8183808, allocated nonzeros=8183808
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
KSP Object: (mg_coarse_telescope_mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_coarse_sub_) 1 MPI processes
type: ilu
ILU: out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=64, cols=64
package used to perform factorization: petsc
total: nonzeros=352, allocated nonzeros=352
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=64, cols=64
total: nonzeros=352, allocated nonzeros=352
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=1179648, cols=1179648
total: nonzeros=8183808, allocated nonzeros=8183808
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 18432 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 18432 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=9437184, cols=9437184
total: nonzeros=65765376, allocated nonzeros=65765376
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 18432 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 18432 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=75497472, cols=75497472
total: nonzeros=527302656, allocated nonzeros=527302656
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=75497472, cols=75497472
total: nonzeros=527302656, allocated nonzeros=527302656
total number of mallocs used during MatSetValues calls =0
has attached null space
-------------- next part --------------
KSP Object: 18432 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-07, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
type: mg
PC has not been set up so information may be incomplete
MG: type is MULTIPLICATIVE, levels=4 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 18432 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using DEFAULT norm type for convergence test
PC Object: (mg_coarse_) 18432 MPI processes
type: redundant
PC has not been set up so information may be incomplete
Redundant preconditioner: Not yet setup
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 18432 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 18432 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 18432 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 18432 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 18432 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 18432 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=603979776, cols=603979776
total: nonzeros=4223139840, allocated nonzeros=4223139840
total number of mallocs used during MatSetValues calls =0
has attached null space
-------------- next part --------------
Summary of Memory Usage in PETSc
Maximum (over computational time) process memory: total 7.0727e+08 max 3.0916e+05 min 3.0554e+05
Current process memory: total 7.0727e+08 max 3.0916e+05 min 3.0554e+05
Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11 max 2.7844e+08 min 2.7726e+08
Current space PetscMalloc()ed: total 1.8275e+09 max 8.1280e+05 min 7.7357e+05
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 6 5 4160 0.
Vector 69 68 3491192 0.
Vector Scatter 16 12 46144 0.
Matrix 59 59 7940068 0.
Matrix Null Space 1 1 592 0.
Distributed Mesh 5 1 5104 0.
Star Forest Bipartite Graph 10 2 1696 0.
Discrete System 5 1 876 0.
Index Set 33 33 255408 0.
IS L to G Mapping 5 1 21360 0.
Krylov Solver 9 9 11112 0.
DMKSP interface 3 0 0 0.
Preconditioner 9 9 8880 0.
-------------- next part --------------
Summary of Memory Usage in PETSc
Maximum (over computational time) process memory: total 5.9431e+09 max 3.2733e+05 min 3.1110e+05
Current process memory: total 5.9431e+09 max 3.2733e+05 min 3.1110e+05
Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12 max 2.8973e+08 min 2.8858e+08
Current space PetscMalloc()ed: total 5.4844e+09 max 3.0005e+05 min 2.9005e+05
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 9 8 6656 0.
Vector 480 479 6915440 0.
Vector Scatter 19 16 55040 0.
Matrix 73 73 2588332 0.
Matrix Null Space 1 1 592 0.
Distributed Mesh 6 3 15312 0.
Star Forest Bipartite Graph 12 6 5088 0.
Discrete System 6 3 2628 0.
Index Set 42 42 98496 0.
IS L to G Mapping 6 3 28224 0.
Krylov Solver 9 9 11032 0.
DMKSP interface 4 2 1296 0.
Preconditioner 9 9 8992 0.
-------------- next part --------------
-ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_lag_norm
-ksp_rtol 1e-7
-ksp_initial_guess_nonzero yes
-ksp_converged_reason
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 4
-mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-log_view
-memory_view
-matptap_scalable
-ksp_view
-options_left 1
# Setting dmdarepart on subcomm
-mg_coarse_telescope_repart_da_processors_x 12
-mg_coarse_telescope_repart_da_processors_y 1
-mg_coarse_telescope_repart_da_processors_z 3
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 3
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type bjacobi
-------------- next part --------------
-ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_lag_norm
-ksp_rtol 1e-7
-ksp_initial_guess_nonzero yes
-ksp_converged_reason
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 3
-mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-log_view
-memory_view
-matptap_scalable
-ksp_view
-options_left 1
# Setting dmdarepart on subcomm
-mg_coarse_telescope_repart_da_processors_x 24
-mg_coarse_telescope_repart_da_processors_y 2
-mg_coarse_telescope_repart_da_processors_z 6
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 3
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type bjacobi
-------------- next part --------------
-ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_lag_norm
-ksp_rtol 1e-7
-ksp_initial_guess_nonzero yes
-ksp_converged_reason
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 4
-mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-log_view
-memory_view
-matptap_scalable
-ksp_view
-ksp_view_pre
-options_left 1
# Setting dmdarepart on subcomm
-mg_coarse_telescope_repart_da_processors_x 24
-mg_coarse_telescope_repart_da_processors_y 2
-mg_coarse_telescope_repart_da_processors_z 6
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 3
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type bjacobi
More information about the petsc-users
mailing list