[petsc-users] MatMultTranspose memory usage

Karl Lin karl.linkui at gmail.com
Wed Jul 31 07:34:32 CDT 2019


Thanks Barry. I will run with log view to see the info.

On Wed, Jul 31, 2019 at 4:34 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   You can run with -log_view -log_view_memory to get a summary that
> contains information in the right columns about the amounts of memory
> allocated within each method. So for example the line with MatAssemblyEnd
> may tell you something.
>
>   One additional "chunk" of memory that is needed in the parallel case but
> not the sequential case is 1) a vector whose length is the number of
> columns of the "off-diagonal" part of the parallel matrix (see the
> explanation of diagonal and off-diagonal in the manual page for
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateAIJ.html)
> and a VecScatter that is needed to communicate these "off-diagonal" vector
> parts. Normally this is pretty small compared to the memory of the matrix
> but it can be measurable depending on the parallel partitioning of the
> matrix and the nonzero structure fo the matrix (for example with a stencil
> width of 2 this extra memory will be larger than a stencil width of 1).
>
>   Barry
>
>
> > On Jul 30, 2019, at 8:29 PM, Karl Lin via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> > I checked the resident set size via /proc/self/stat
> >
> > On Tue, Jul 30, 2019 at 8:13 PM Mills, Richard Tran via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> > Hi Karl,
> >
> > I'll let one of my colleagues who has a better understanding of exactly
> what happens with memory during matrix PETSc matrix assembly chime in, but
> let me ask how you know that the memory footprint is actually larger than
> you think it should be? Are you looking at the resident set size reported
> by a tool like 'top'? Keep in mind that even if extra buffers were
> allocated and then free()d, that resident set size of your process may stay
> the same, and only decrease when the OS's memory manager decides it really
> needs those pages for something else.
> >
> > --Richard
> >
> > On 7/30/19 5:35 PM, Karl Lin via petsc-users wrote:
> >> Thanks for the feedback, very helpful.
> >>
> >> I have another question, when I run 4 processes, even though the matrix
> is only 49.7GB, I found the memory footprint of the matrix is about 52.8GB.
> Where does these extra memory comes from? Does MatCreateAIJ still reserves
> some extra memory? I thought after MatAssembly all unused space would be
> released but in at least one of the processes, the memory footprint of the
> local matrix actually increased after MatAssembly by couple of GBs. I will
> greatly appreciate any info.
> >>
> >>
> >> On Tue, Jul 30, 2019 at 6:34 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >>
> >>    Thanks, this is enough information to diagnose the problem.
> >>
> >>    The problem is that 32 bit integers are not large enough to contain
> the "counts", in this case the number of nonzeros in the matrix. A signed
> integer can only be as large as PETSC_MAX_INT            2147483647.
> >>
> >>     You need to configure PETSc with the additional option
> --with-64-bit-indices, then PETSc will use 64 bit integers for PetscInt so
> you don't run out space in int for such large counts.
> >>
> >>
> >>     We don't do a perfect job of detecting when there is overflow of
> int which is why you ended up with crazy allocation requests like
> 18446744058325389312.
> >>
> >>     I will add some more error checking to provide more useful error
> messages in this case.
> >>
> >>     Barry
> >>
> >>    The reason this worked for 4 processes is that the largest count in
> that case was roughly 6,653,750,976/4 which does fit into an int. PETSc
> only needs to know the number of nonzeros on each process, it doesn't need
> to know the amount across all the processors. In other words you may want
> to use a different PETSC_ARCH (different configuration) for small number of
> processors and large number depending on how large your problem is. Or you
> can always use 64 bit integers at a little performance and memory cost.
> >>
> >>
> >>
> >>
> >> > On Jul 30, 2019, at 5:27 PM, Karl Lin via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >> >
> >> > number of rows is 26,326,575. maximum column index is 36,416,250.
> number of nonzero coefficients is 6,653,750,976, which amounts to 49.7GB
> for coefficients in PetscScalar and column index in PetscInt. I can run the
> program in 4 processes with this input but not single process. Here are the
> snap shots of the error:
> >> >
> >> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> >> > [0]PETSC ERROR: Out of memory. This could be due to allocating
> >> > [0]PETSC ERROR: too large an object or bleeding by not properly
> >> > [0]PETSC ERROR: destroying unneeded objects.
> >> > [0]PETSC ERROR: Memory allocated 0 Memory used by process 727035904
> >> > [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> >> > [0]PETSC ERROR: Memory requested 18446744058325389312
> >> > [0]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> >> > [0]PETSC ERROR: Petsc Release Version 3.10.4, Feb, 26, 2019
> >> >
> >> > [0]PETSC ERROR: #1 MatSeqAIJSetPreallocation_SeqAIJ() line 3711 in
> /petsc-3.10.4/src/mat/impls/aij/seq/aij.c
> >> > [0]PETSC ERROR: #2 PetscMallocA() line 390 in
> /petsc-3.10.4/src/sys/memory/mal.c
> >> > [0]PETSC ERROR: #3 MatSeqAIJSetPreallocation_SeqAIJ() line 3711
> in/petsc-3.10.4/src/mat/impls/aij/seq/aij.c
> >> > [0]PETSC ERROR: #4 MatSeqAIJSetPreallocation() line 3649 in
> /petsc-3.10.4/src/mat/impls/aij/seq/aij.c
> >> > [0]PETSC ERROR: #5 MatCreateAIJ() line 4413 in
> /petsc-3.10.4/src/mat/impls/aij/mpi/mpiaij.c
> >> > [0]PETSC ERROR: #6 *** (my code)
> >> > [0]PETSC ERROR:
> ------------------------------------------------------------------------
> >> > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> >> > [0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> >> > [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >> > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
> Mac OS X to find memory corruption errors
> >> > [0]PETSC ERROR: configure using --with-debugging=yes, recompile,
> link, and run
> >> > [0]PETSC ERROR: to get more information on the crash.
> >> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> >> > [0]PETSC ERROR: Signal received
> >> > [0]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> >> > [0]PETSC ERROR: Petsc Release Version 3.10.4, Feb, 26, 2019
> >> >
> >> >
> >> >
> >> > On Tue, Jul 30, 2019 at 10:34 AM Matthew Knepley <knepley at gmail.com>
> wrote:
> >> > On Wed, Jul 31, 2019 at 3:25 AM Karl Lin via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >> > Hi, Richard,
> >> >
> >> > We have a new question. Is there a limit for MatCreateMPIAIJ and
> MatSetValues? What I mean is that, we tried to create a sparse matrix and
> populate it with 50GB of data in one process, I got a crash and error
> saying object too big. Thank you for any insight.
> >> >
> >> > 1) Always send the complete error.
> >> >
> >> > 2) It sounds like you got an out of memory error for that process.
> >> >
> >> >    Matt
> >> >
> >> > Regards,
> >> >
> >> > Karl
> >> >
> >> > On Thu, Jul 18, 2019 at 2:36 PM Mills, Richard Tran via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >> > Hi Kun and Karl,
> >> >
> >> > If you are using the AIJMKL matrix types and have a recent version of
> MKL, the AIJMKL code uses MKL's inspector-executor sparse BLAS routines,
> which are described at
> >> >
> >> >
> https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
> >> >
> >> > The inspector-executor analysis routines take the AIJ (compressed
> sparse row) format data from PETSc and then create a copy in an optimized,
> internal layout used by MKL. We have to keep PETSc's own, AIJ
> representation around, as it is needed for several operations that MKL does
> not provide. This does, unfortunately, mean that roughly double (or more,
> depending on what MKL decides to do) the amount of memory is required. The
> reason you see the memory usage increase right when a MatMult() or
> MatMultTranspose() operation occurs is that the we default to a "lazy"
> approach to calling the analysis routine (mkl_sparse_optimize()) until an
> operation that uses an MKL-provided kernel is requested. (You can use an
> "eager" approach that calls mkl_sparse_optimize() during MatAssemblyEnd()
> by specifying "-mat_aijmkl_eager_inspection" in the PETSc options.)
> >> >
> >> > If memory is at enough of a premium for you that you can't afford the
> extra copy used by the MKL inspector-executor routines, then I suggest
> using the usual PETSc AIJ format instead of AIJMKL. AIJ is fairly well
> optimized for many cases (and even has some hand-optimized kernels using
> Intel AVX/AVX2/AVX-512 intrinsics) and often outperforms AIJMKL. You should
> try both AIJ and AIJMKL, anyway, to see which is faster for your
> combination of problem and computing platform.
> >> >
> >> > Best regards,
> >> > Richard
> >> >
> >> > On 7/17/19 8:46 PM, Karl Lin via petsc-users wrote:
> >> >> We also found that if we use MatCreateSeqAIJ, then no more memory
> increase with matrix vector multiplication. However, with
> MatCreateMPIAIJMKL, the behavior is consistent.
> >> >>
> >> >> On Wed, Jul 17, 2019 at 5:26 PM Karl Lin <karl.linkui at gmail.com>
> wrote:
> >> >> MatCreateMPIAIJMKL
> >> >>
> >> >> parallel and sequential exhibit the same behavior. In fact, we found
> that doing matmult will increase the memory by the size of matrix as well.
> >> >>
> >> >> On Wed, Jul 17, 2019 at 4:55 PM Zhang, Hong <hzhang at mcs.anl.gov>
> wrote:
> >> >> Karl:
> >> >> What matrix format do you use? Run it in parallel or sequential?
> >> >> Hong
> >> >>
> >> >> We used /proc/self/stat to track the resident set size during
> program run, and we saw the resident set size jumped by the size of the
> matrix right after we did matmulttranspose.
> >> >>
> >> >> On Wed, Jul 17, 2019 at 12:04 PM hong--- via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >> >> Kun:
> >> >> How do you know 'MatMultTranpose creates an extra memory copy of
> matrix'?
> >> >> Hong
> >> >>
> >> >> Hi,
> >> >>
> >> >>
> >> >> I was using MatMultTranpose and MatMult to solver a linear system.
> >> >>
> >> >>
> >> >> However we found out, MatMultTranpose create an extra memory copy of
> matrix for its operation. This extra memory copy is not stated everywhere
> in petsc manual.
> >> >>
> >> >>
> >> >> This basically double my memory requirement to solve my system.
> >> >>
> >> >>
> >> >> I remember mkl’s routine can do inplace matrix transpose vector
> product, without transposing the matrix itself.
> >> >>
> >> >>
> >> >> Is this always the case? Or there is way to make petsc to do inplace
> matrix transpose vector product.
> >> >>
> >> >>
> >> >> Any help is greatly appreciated.
> >> >>
> >> >>
> >> >> Regards,
> >> >>
> >> >> Kun
> >> >>
> >> >>
> >> >>
> >> >> Schlumberger-Private
> >> >
> >> >
> >> >
> >> > --
> >> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> >> > -- Norbert Wiener
> >> >
> >> > https://www.cse.buffalo.edu/~knepley/
> >>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190731/c6a1efd5/attachment-0001.html>


More information about the petsc-users mailing list