[petsc-users] Question about memory usage in Multigrid preconditioner

Dave May dave.mayhem23 at gmail.com
Wed Jul 13 04:17:06 CDT 2016


Hi Barry,



>   Dave,
>
>    MatPtAP has to generate some work space. Is it possible the "guess" it
> uses for needed work space is so absurdly (and unnecessarily) large that it
> triggers a memory issue?  It is possible that other places that require
> "guesses" for work space produce a problem?


This is entirely possible. I've never ever used PtAP at the scale of Franks
simulation.
I poked around in

src/mat/impls/aij/mpi/mpiptap.c
In this function, MatPtAPSymbolic_MPIAIJ_MPIAIJ_ptap()
I see the following code

  /* set default scalable */

  ptap->scalable = PETSC_FALSE; /* PETSC_TRUE; */

  ierr =
PetscOptionsGetBool(((PetscObject)Cmpi)->options,((PetscObject)Cmpi)->prefix,
"-matptap_scalable",&ptap->
This indicates that the default choice being used (despite the comment,) is
to use the faster, but also the more memory hungry variant of MatPtAP for
MPIAIJ matrices.
Looks like someone has changed the default.

The following comment is off topic from the email thread but...

This particular file is littered with #ifdefs related to profiling
(PTAP_PROFILE).
This variable is not defined by default. I would much prefer be if this
kind of thing was available all the time via a run time flag rather than a
configure flag.
Also, it would be great to augment the profiling for PtAP with memory usage
as currently only CPU time is logged.

Awhile back I proposed a PR for an "operation logger" object (which you
absolutely hated). The functionality of this logger would be useful to get
rid of the #if defined stuff for PtAP and be able to report meaningful
details about both the memory and CPU time. I used this logger for the
pctelescope paper and found it immensely useful.

But to the topic. Frank, you might want to try running your job with the
command line option
-matptap_scalable
(or  -XXX_matptap_scalable if you have given assigned a name to your
operator.)
As always, run a small job first with -options_left 1 to ensure the option
name is spelled correctly and being used.

Let us know if this helps.


Cheers,
  Dave


Also are all the "guesses" properly -info logged so that we can detected
> them before the program is killed?
>
>
>   Barry
>
>
> > At any stage in PCTelescope, only 2 operators of size 32 MB are held in
> memory when the DMDA is provided.
> >
> > From my rough estimates, the worst case memory foot print for any given
> core, given your options is approximately
> > 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
> > This is way below 8 GB.
> >
> > Note this estimate completely ignores:
> > (1) the memory required for the restriction operator,
> > (2) the potential growth in the number of non-zeros per row due to
> Galerkin coarsening (I wished -ksp_view_pre reported the output from
> MatView so we could see the number of non-zeros required by the coarse
> level operators)
> > (3) all temporary vectors required by the CG solver, and those required
> by the smoothers.
> > (4) internal memory allocated by MatPtAP
> > (5) memory associated with IS's used within PCTelescope
> >
> > So either I am completely off in my estimates, or you have not carefully
> estimated the memory usage of your application code. Hopefully others might
> examine/correct my rough estimates
> >
> > Since I don't have your code I cannot access the latter.
> > Since I don't have access to the same machine you are running on, I
> think we need to take a step back.
> >
> > [1] What machine are you running on? Send me a URL if its available
> >
> > [2] What discretization are you using? (I am guessing a scalar 7 point
> FD stencil)
> > If it's a 7 point FD stencil, we should be able to examine the memory
> usage of your solver configuration using a standard, light weight existing
> PETSc example, run on your machine at the same scale.
> > This would hopefully enable us to correctly evaluate the actual memory
> usage required by the solver configuration you are using.
> >
> > Thanks,
> >   Dave
> >
> >
> >
> > Frank
> >
> >
> >
> >
> > On 07/08/2016 10:38 PM, Dave May wrote:
> >>
> >>
> >> On Saturday, 9 July 2016, frank <hengjiew at uci.edu> wrote:
> >> Hi Barry and Dave,
> >>
> >> Thank both of you for the advice.
> >>
> >> @Barry
> >> I made a mistake in the file names in last email. I attached the
> correct files this time.
> >> For all the three tests, 'Telescope' is used as the coarse
> preconditioner.
> >>
> >> == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
> >> Part of the memory usage:  Vector   125            124 3971904     0.
> >>                                              Matrix   101 101
> 9462372     0
> >>
> >> == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
> >> Part of the memory usage:  Vector   125            124 681672     0.
> >>                                              Matrix   101 101
> 1462180     0.
> >>
> >> In theory, the memory usage in Test1 should be 8 times of Test2. In my
> case, it is about 6 times.
> >>
> >> == Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain per
> process: 32*32*32
> >> Here I get the out of memory error.
> >>
> >> I tried to use -mg_coarse jacobi. In this way, I don't need to set
> -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
> >> The linear solver didn't work in this case. Petsc output some errors.
> >>
> >> @Dave
> >> In test3, I use only one instance of 'Telescope'. On the coarse mesh of
> 'Telescope', I used LU as the preconditioner instead of SVD.
> >> If my set the levels correctly, then on the last coarse mesh of MG
> where it calls 'Telescope', the sub-domain per process is 2*2*2.
> >> On the last coarse mesh of 'Telescope', there is only one grid point
> per process.
> >> I still got the OOM error. The detailed petsc option file is attached.
> >>
> >> Do you understand the expected memory usage for the particular parallel
> LU implementation you are using? I don't (seriously). Replace LU with
> bjacobi and re-run this test. My point about solver debugging is still
> valid.
> >>
> >> And please send the result of KSPView so we can see what is actually
> used in the computations
> >>
> >> Thanks
> >>   Dave
> >>
> >>
> >>
> >> Thank you so much.
> >>
> >> Frank
> >>
> >>
> >>
> >> On 07/06/2016 02:51 PM, Barry Smith wrote:
> >> On Jul 6, 2016, at 4:19 PM, frank <hengjiew at uci.edu> wrote:
> >>
> >> Hi Barry,
> >>
> >> Thank you for you advice.
> >> I tried three test. In the 1st test, the grid is 3072*256*768 and the
> process mesh is 96*8*24.
> >> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is
> used as the preconditioner at the coarse mesh.
> >> The system gives me the "Out of Memory" error before the linear system
> is completely solved.
> >> The info from '-ksp_view_pre' is attached. I seems to me that the error
> occurs when it reaches the coarse mesh.
> >>
> >> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24.
> The 3rd test uses the same grid but a different process mesh 48*4*12.
> >>     Are you sure this is right? The total matrix and vector memory
> usage goes from 2nd test
> >>                Vector   384            383      8,193,712     0.
> >>                Matrix   103            103     11,508,688     0.
> >> to 3rd test
> >>               Vector   384            383      1,590,520     0.
> >>                Matrix   103            103      3,508,664     0.
> >> that is the memory usage got smaller but if you have only 1/8th the
> processes and the same grid it should have gotten about 8 times bigger. Did
> you maybe cut the grid by a factor of 8 also? If so that still doesn't
> explain it because the memory usage changed by a factor of 5 something for
> the vectors and 3 something for the matrices.
> >>
> >>
> >> The linear solver and petsc options in 2nd and 3rd tests are the same
> in 1st test. The linear solver works fine in both test.
> >> I attached the memory usage of the 2nd and 3rd tests. The memory info
> is from the option '-log_summary'. I tried to use '-momery_info' as you
> suggested, but in my case petsc treated it as an unused option. It output
> nothing about the memory. Do I need to add sth to my code so I can use
> '-memory_info'?
> >>     Sorry, my mistake the option is -memory_view
> >>
> >>    Can you run the one case with -memory_view and -mg_coarse jacobi
> -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory
> is used without the telescope? Also run case 2 the same way.
> >>
> >>    Barry
> >>
> >>
> >>
> >> In both tests the memory usage is not large.
> >>
> >> It seems to me that it might be the 'telescope'  preconditioner that
> allocated a lot of memory and caused the error in the 1st test.
> >> Is there is a way to show how much memory it allocated?
> >>
> >> Frank
> >>
> >> On 07/05/2016 03:37 PM, Barry Smith wrote:
> >>    Frank,
> >>
> >>      You can run with -ksp_view_pre to have it "view" the KSP before
> the solve so hopefully it gets that far.
> >>
> >>       Please run the problem that does fit with -memory_info when the
> problem completes it will show the "high water mark" for PETSc allocated
> memory and total memory used. We first want to look at these numbers to see
> if it is using more memory than you expect. You could also run with say
> half the grid spacing to see how the memory usage scaled with the increase
> in grid points. Make the runs also with -log_view and send all the output
> from these options.
> >>
> >>     Barry
> >>
> >> On Jul 5, 2016, at 5:23 PM, frank <hengjiew at uci.edu> wrote:
> >>
> >> Hi,
> >>
> >> I am using the CG ksp solver and Multigrid preconditioner  to solve a
> linear system in parallel.
> >> I chose to use the 'Telescope' as the preconditioner on the coarse mesh
> for its good performance.
> >> The petsc options file is attached.
> >>
> >> The domain is a 3d box.
> >> It works well when the grid is  1536*128*384 and the process mesh is
> 96*8*24. When I double the size of grid and keep the same process mesh and
> petsc options, I get an "out of memory" error from the super-cluster I am
> using.
> >> Each process has access to at least 8G memory, which should be more
> than enough for my application. I am sure that all the other parts of my
> code( except the linear solver ) do not use much memory. So I doubt if
> there is something wrong with the linear solver.
> >> The error occurs before the linear system is completely solved so I
> don't have the info from ksp view. I am not able to re-produce the error
> with a smaller problem either.
> >> In addition,  I tried to use the block jacobi as the preconditioner
> with the same grid and same decomposition. The linear solver runs extremely
> slow but there is no memory error.
> >>
> >> How can I diagnose what exactly cause the error?
> >> Thank you so much.
> >>
> >> Frank
> >> <petsc_options.txt>
> >>
> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
> >>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160713/80a97b08/attachment.html>


More information about the petsc-users mailing list