[petsc-users] Question about memory usage in Multigrid preconditioner

Wed Jul 13 18:07:51 CDT 2016

Hi Dave,

Sorry for the late reply.
Thank you so much for your detailed reply.

I have a question about the estimation of the memory usage. There are 
4223139840 allocated non-zeros and 18432 MPI processes. Double precision 
is used. So the memory per process is:
   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
Did I do sth wrong here? Because this seems too small.

I am running this job on Bluewater 
<https://bluewaters.ncsa.illinois.edu/user-guide>
I am using the 7 points FD stencil in 3D.

I apologize that I made a stupid mistake in computing the memory per 
core. My settings render each core can access only 2G memory on average 
instead of 8G which I mentioned in previous email. I re-run the job with 
8G memory per core on average and there is no "Out Of Memory" error. I 
would do more test to see if there is still some memory issue.

Regards,
Frank

On 07/11/2016 01:18 PM, Dave May wrote:
> Hi Frank,
>
>
> On 11 July 2016 at 19:14, frank <hengjiew at uci.edu 
> <mailto:hengjiew at uci.edu>> wrote:
>
>     Hi Dave,
>
>     I re-run the test using bjacobi as the preconditioner on the
>     coarse mesh of telescope. The Grid is 3072*256*768 and process
>     mesh is 96*8*24. The petsc option file is attached.
>     I still got the "Out Of Memory" error. The error occurred before
>     the linear solver finished one step. So I don't have the full info
>     from ksp_view. The info from ksp_view_pre is attached.
>
>
> Okay - that is essentially useless (sorry)
>
>
>     It seems to me that the error occurred when the decomposition was
>     going to be changed.
>
>
> Based on what information?
> Running with -info would give us more clues, but will create a ton of 
> output.
> Please try running the case which failed with -info
>
>     I had another test with a grid of 1536*128*384 and the same
>     process mesh as above. There was no error. The ksp_view info is
>     attached for comparison.
>     Thank you.
>
>
>
> [3] Here is my crude estimate of your memory usage.
> I'll target the biggest memory hogs only to get an order of magnitude 
> estimate
>
> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per 
> MPI rank assuming double precision.
> The indices for the AIJ could amount to another 0.3 GB (assuming 32 
> bit integers)
>
> * You use 5 levels of coarsening, so the other operators should 
> represent (collectively)
> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the 
> communicator with 18432 ranks.
> The coarse grid should consume ~ 0.5 MB per MPI rank on the 
> communicator with 18432 ranks.
>
> * You use a reduction factor of 64, making the new communicator with 
> 288 MPI ranks.
> PCTelescope will first gather a temporary matrix associated with your 
> coarse level operator assuming a comm size of 288 living on the comm 
> with size 18432.
> This matrix will require approximately 0.5 * 64 = 32 MB per core on 
> the 288 ranks.
> This matrix is then used to form a new MPIAIJ matrix on the subcomm, 
> thus require another 32 MB per rank.
> The temporary matrix is now destroyed.
>
> * Because a DMDA is detected, a permutation matrix is assembled.
> This requires 2 doubles per point in the DMDA.
> Your coarse DMDA contains 92 x 16 x 48 points.
> Thus the permutation matrix will require < 1 MB per MPI rank on the 
> sub-comm.
>
> * Lastly, the matrix is permuted. This uses MatPtAP(), but the 
> resulting operator will have the same memory footprint as the 
> unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 
> operators of size 32 MB are held in memory when the DMDA is provided.
>
> From my rough estimates, the worst case memory foot print for any 
> given core, given your options is approximately
> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
> This is way below 8 GB.
>
> Note this estimate completely ignores:
> (1) the memory required for the restriction operator,
> (2) the potential growth in the number of non-zeros per row due to 
> Galerkin coarsening (I wished -ksp_view_pre reported the output from 
> MatView so we could see the number of non-zeros required by the coarse 
> level operators)
> (3) all temporary vectors required by the CG solver, and those 
> required by the smoothers.
> (4) internal memory allocated by MatPtAP
> (5) memory associated with IS's used within PCTelescope
>
> So either I am completely off in my estimates, or you have not 
> carefully estimated the memory usage of your application code. 
> Hopefully others might examine/correct my rough estimates
>
> Since I don't have your code I cannot access the latter.
> Since I don't have access to the same machine you are running on, I 
> think we need to take a step back.
>
> [1] What machine are you running on? Send me a URL if its available
>
> [2] What discretization are you using? (I am guessing a scalar 7 point 
> FD stencil)
> If it's a 7 point FD stencil, we should be able to examine the memory 
> usage of your solver configuration using a standard, light weight 
> existing PETSc example, run on your machine at the same scale.
> This would hopefully enable us to correctly evaluate the actual memory 
> usage required by the solver configuration you are using.
>
> Thanks,
>   Dave
>
>
>
>     Frank
>
>
>
>
>     On 07/08/2016 10:38 PM, Dave May wrote:
>>
>>
>>     On Saturday, 9 July 2016, frank <hengjiew at uci.edu> wrote:
>>
>>         Hi Barry and Dave,
>>
>>         Thank both of you for the advice.
>>
>>         @Barry
>>         I made a mistake in the file names in last email. I attached
>>         the correct files this time.
>>         For all the three tests, 'Telescope' is used as the coarse
>>         preconditioner.
>>
>>         == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
>>         Part of the memory usage:  Vector   125   124 3971904     0.
>>          Matrix   101 101      9462372     0
>>
>>         == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
>>         Part of the memory usage:  Vector   125   124 681672     0.
>>          Matrix   101 101      1462180     0.
>>
>>         In theory, the memory usage in Test1 should be 8 times of
>>         Test2. In my case, it is about 6 times.
>>
>>         == Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24.
>>         Sub-domain per process: 32*32*32
>>         Here I get the out of memory error.
>>
>>         I tried to use -mg_coarse jacobi. In this way, I don't need
>>         to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly,
>>         right?
>>         The linear solver didn't work in this case. Petsc output some
>>         errors.
>>
>>         @Dave
>>         In test3, I use only one instance of 'Telescope'. On the
>>         coarse mesh of 'Telescope', I used LU as the preconditioner
>>         instead of SVD.
>>         If my set the levels correctly, then on the last coarse mesh
>>         of MG where it calls 'Telescope', the sub-domain per process
>>         is 2*2*2.
>>         On the last coarse mesh of 'Telescope', there is only one
>>         grid point per process.
>>         I still got the OOM error. The detailed petsc option file is
>>         attached.
>>
>>
>>     Do you understand the expected memory usage for the
>>     particular parallel LU implementation you are using? I don't
>>     (seriously). Replace LU with bjacobi and re-run this test. My
>>     point about solver debugging is still valid.
>>
>>     And please send the result of KSPView so we can see what is
>>     actually used in the computations
>>
>>     Thanks
>>       Dave
>>
>>
>>
>>         Thank you so much.
>>
>>         Frank
>>
>>
>>
>>         On 07/06/2016 02:51 PM, Barry Smith wrote:
>>
>>                 On Jul 6, 2016, at 4:19 PM, frank <hengjiew at uci.edu
>>                 <mailto:hengjiew at uci.edu>> wrote:
>>
>>                 Hi Barry,
>>
>>                 Thank you for you advice.
>>                 I tried three test. In the 1st test, the grid is
>>                 3072*256*768 and the process mesh is 96*8*24.
>>                 The linear solver is 'cg' the preconditioner is 'mg'
>>                 and 'telescope' is used as the preconditioner at the
>>                 coarse mesh.
>>                 The system gives me the "Out of Memory" error before
>>                 the linear system is completely solved.
>>                 The info from '-ksp_view_pre' is attached. I seems to
>>                 me that the error occurs when it reaches the coarse mesh.
>>
>>                 The 2nd test uses a grid of 1536*128*384 and process
>>                 mesh is 96*8*24. The 3rd test uses the same grid but
>>                 a different process mesh 48*4*12.
>>
>>                 Are you sure this is right? The total matrix and
>>             vector memory usage goes from 2nd test
>>                            Vector   384            383   8,193,712     0.
>>                            Matrix   103            103  11,508,688     0.
>>             to 3rd test
>>                           Vector   384            383 1,590,520     0.
>>                            Matrix   103            103   3,508,664     0.
>>             that is the memory usage got smaller but if you have only
>>             1/8th the processes and the same grid it should have
>>             gotten about 8 times bigger. Did you maybe cut the grid
>>             by a factor of 8 also? If so that still doesn't explain
>>             it because the memory usage changed by a factor of 5
>>             something for the vectors and 3 something for the matrices.
>>
>>
>>                 The linear solver and petsc options in 2nd and 3rd
>>                 tests are the same in 1st test. The linear solver
>>                 works fine in both test.
>>                 I attached the memory usage of the 2nd and 3rd tests.
>>                 The memory info is from the option '-log_summary'. I
>>                 tried to use '-momery_info' as you suggested, but in
>>                 my case petsc treated it as an unused option. It
>>                 output nothing about the memory. Do I need to add sth
>>                 to my code so I can use '-memory_info'?
>>
>>                 Sorry, my mistake the option is -memory_view
>>
>>                Can you run the one case with -memory_view and
>>             -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't
>>             iterate forever) to see how much memory is used without
>>             the telescope? Also run case 2 the same way.
>>
>>                Barry
>>
>>
>>
>>                 In both tests the memory usage is not large.
>>
>>                 It seems to me that it might be the 'telescope' 
>>                 preconditioner that allocated a lot of memory and
>>                 caused the error in the 1st test.
>>                 Is there is a way to show how much memory it allocated?
>>
>>                 Frank
>>
>>                 On 07/05/2016 03:37 PM, Barry Smith wrote:
>>
>>                        Frank,
>>
>>                          You can run with -ksp_view_pre to have it
>>                     "view" the KSP before the solve so hopefully it
>>                     gets that far.
>>
>>                           Please run the problem that does fit with
>>                     -memory_info when the problem completes it will
>>                     show the "high water mark" for PETSc allocated
>>                     memory and total memory used. We first want to
>>                     look at these numbers to see if it is using more
>>                     memory than you expect. You could also run with
>>                     say half the grid spacing to see how the memory
>>                     usage scaled with the increase in grid points.
>>                     Make the runs also with -log_view and send all
>>                     the output from these options.
>>
>>                         Barry
>>
>>                         On Jul 5, 2016, at 5:23 PM, frank
>>                         <hengjiew at uci.edu <mailto:hengjiew at uci.edu>>
>>                         wrote:
>>
>>                         Hi,
>>
>>                         I am using the CG ksp solver and Multigrid
>>                         preconditioner  to solve a linear system in
>>                         parallel.
>>                         I chose to use the 'Telescope' as the
>>                         preconditioner on the coarse mesh for its
>>                         good performance.
>>                         The petsc options file is attached.
>>
>>                         The domain is a 3d box.
>>                         It works well when the grid is 1536*128*384
>>                         and the process mesh is 96*8*24. When I
>>                         double the size of grid and keep the same
>>                         process mesh and petsc options, I get an "out
>>                         of memory" error from the super-cluster I am
>>                         using.
>>                         Each process has access to at least 8G
>>                         memory, which should be more than enough for
>>                         my application. I am sure that all the other
>>                         parts of my code( except the linear solver )
>>                         do not use much memory. So I doubt if there
>>                         is something wrong with the linear solver.
>>                         The error occurs before the linear system is
>>                         completely solved so I don't have the info
>>                         from ksp view. I am not able to re-produce
>>                         the error with a smaller problem either.
>>                         In addition,  I tried to use the block jacobi
>>                         as the preconditioner with the same grid and
>>                         same decomposition. The linear solver runs
>>                         extremely slow but there is no memory error.
>>
>>                         How can I diagnose what exactly cause the error?
>>                         Thank you so much.
>>
>>                         Frank
>>                         <petsc_options.txt>
>>
>>                 <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160713/ebd17769/attachment-0001.html>