[petsc-users] MatMultTranspose memory usage
Smith, Barry F.
bsmith at mcs.anl.gov
Wed Jul 31 04:34:25 CDT 2019
You can run with -log_view -log_view_memory to get a summary that contains information in the right columns about the amounts of memory allocated within each method. So for example the line with MatAssemblyEnd may tell you something.
One additional "chunk" of memory that is needed in the parallel case but not the sequential case is 1) a vector whose length is the number of columns of the "off-diagonal" part of the parallel matrix (see the explanation of diagonal and off-diagonal in the manual page for https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateAIJ.html) and a VecScatter that is needed to communicate these "off-diagonal" vector parts. Normally this is pretty small compared to the memory of the matrix but it can be measurable depending on the parallel partitioning of the matrix and the nonzero structure fo the matrix (for example with a stencil width of 2 this extra memory will be larger than a stencil width of 1).
Barry
> On Jul 30, 2019, at 8:29 PM, Karl Lin via petsc-users <petsc-users at mcs.anl.gov> wrote:
>
> I checked the resident set size via /proc/self/stat
>
> On Tue, Jul 30, 2019 at 8:13 PM Mills, Richard Tran via petsc-users <petsc-users at mcs.anl.gov> wrote:
> Hi Karl,
>
> I'll let one of my colleagues who has a better understanding of exactly what happens with memory during matrix PETSc matrix assembly chime in, but let me ask how you know that the memory footprint is actually larger than you think it should be? Are you looking at the resident set size reported by a tool like 'top'? Keep in mind that even if extra buffers were allocated and then free()d, that resident set size of your process may stay the same, and only decrease when the OS's memory manager decides it really needs those pages for something else.
>
> --Richard
>
> On 7/30/19 5:35 PM, Karl Lin via petsc-users wrote:
>> Thanks for the feedback, very helpful.
>>
>> I have another question, when I run 4 processes, even though the matrix is only 49.7GB, I found the memory footprint of the matrix is about 52.8GB. Where does these extra memory comes from? Does MatCreateAIJ still reserves some extra memory? I thought after MatAssembly all unused space would be released but in at least one of the processes, the memory footprint of the local matrix actually increased after MatAssembly by couple of GBs. I will greatly appreciate any info.
>>
>>
>> On Tue, Jul 30, 2019 at 6:34 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>
>> Thanks, this is enough information to diagnose the problem.
>>
>> The problem is that 32 bit integers are not large enough to contain the "counts", in this case the number of nonzeros in the matrix. A signed integer can only be as large as PETSC_MAX_INT 2147483647.
>>
>> You need to configure PETSc with the additional option --with-64-bit-indices, then PETSc will use 64 bit integers for PetscInt so you don't run out space in int for such large counts.
>>
>>
>> We don't do a perfect job of detecting when there is overflow of int which is why you ended up with crazy allocation requests like 18446744058325389312.
>>
>> I will add some more error checking to provide more useful error messages in this case.
>>
>> Barry
>>
>> The reason this worked for 4 processes is that the largest count in that case was roughly 6,653,750,976/4 which does fit into an int. PETSc only needs to know the number of nonzeros on each process, it doesn't need to know the amount across all the processors. In other words you may want to use a different PETSC_ARCH (different configuration) for small number of processors and large number depending on how large your problem is. Or you can always use 64 bit integers at a little performance and memory cost.
>>
>>
>>
>>
>> > On Jul 30, 2019, at 5:27 PM, Karl Lin via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> >
>> > number of rows is 26,326,575. maximum column index is 36,416,250. number of nonzero coefficients is 6,653,750,976, which amounts to 49.7GB for coefficients in PetscScalar and column index in PetscInt. I can run the program in 4 processes with this input but not single process. Here are the snap shots of the error:
>> >
>> > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> > [0]PETSC ERROR: Out of memory. This could be due to allocating
>> > [0]PETSC ERROR: too large an object or bleeding by not properly
>> > [0]PETSC ERROR: destroying unneeded objects.
>> > [0]PETSC ERROR: Memory allocated 0 Memory used by process 727035904
>> > [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
>> > [0]PETSC ERROR: Memory requested 18446744058325389312
>> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> > [0]PETSC ERROR: Petsc Release Version 3.10.4, Feb, 26, 2019
>> >
>> > [0]PETSC ERROR: #1 MatSeqAIJSetPreallocation_SeqAIJ() line 3711 in /petsc-3.10.4/src/mat/impls/aij/seq/aij.c
>> > [0]PETSC ERROR: #2 PetscMallocA() line 390 in /petsc-3.10.4/src/sys/memory/mal.c
>> > [0]PETSC ERROR: #3 MatSeqAIJSetPreallocation_SeqAIJ() line 3711 in/petsc-3.10.4/src/mat/impls/aij/seq/aij.c
>> > [0]PETSC ERROR: #4 MatSeqAIJSetPreallocation() line 3649 in /petsc-3.10.4/src/mat/impls/aij/seq/aij.c
>> > [0]PETSC ERROR: #5 MatCreateAIJ() line 4413 in /petsc-3.10.4/src/mat/impls/aij/mpi/mpiaij.c
>> > [0]PETSC ERROR: #6 *** (my code)
>> > [0]PETSC ERROR: ------------------------------------------------------------------------
>> > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>> > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>> > [0]PETSC ERROR: to get more information on the crash.
>> > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> > [0]PETSC ERROR: Signal received
>> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> > [0]PETSC ERROR: Petsc Release Version 3.10.4, Feb, 26, 2019
>> >
>> >
>> >
>> > On Tue, Jul 30, 2019 at 10:34 AM Matthew Knepley <knepley at gmail.com> wrote:
>> > On Wed, Jul 31, 2019 at 3:25 AM Karl Lin via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> > Hi, Richard,
>> >
>> > We have a new question. Is there a limit for MatCreateMPIAIJ and MatSetValues? What I mean is that, we tried to create a sparse matrix and populate it with 50GB of data in one process, I got a crash and error saying object too big. Thank you for any insight.
>> >
>> > 1) Always send the complete error.
>> >
>> > 2) It sounds like you got an out of memory error for that process.
>> >
>> > Matt
>> >
>> > Regards,
>> >
>> > Karl
>> >
>> > On Thu, Jul 18, 2019 at 2:36 PM Mills, Richard Tran via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> > Hi Kun and Karl,
>> >
>> > If you are using the AIJMKL matrix types and have a recent version of MKL, the AIJMKL code uses MKL's inspector-executor sparse BLAS routines, which are described at
>> >
>> > https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
>> >
>> > The inspector-executor analysis routines take the AIJ (compressed sparse row) format data from PETSc and then create a copy in an optimized, internal layout used by MKL. We have to keep PETSc's own, AIJ representation around, as it is needed for several operations that MKL does not provide. This does, unfortunately, mean that roughly double (or more, depending on what MKL decides to do) the amount of memory is required. The reason you see the memory usage increase right when a MatMult() or MatMultTranspose() operation occurs is that the we default to a "lazy" approach to calling the analysis routine (mkl_sparse_optimize()) until an operation that uses an MKL-provided kernel is requested. (You can use an "eager" approach that calls mkl_sparse_optimize() during MatAssemblyEnd() by specifying "-mat_aijmkl_eager_inspection" in the PETSc options.)
>> >
>> > If memory is at enough of a premium for you that you can't afford the extra copy used by the MKL inspector-executor routines, then I suggest using the usual PETSc AIJ format instead of AIJMKL. AIJ is fairly well optimized for many cases (and even has some hand-optimized kernels using Intel AVX/AVX2/AVX-512 intrinsics) and often outperforms AIJMKL. You should try both AIJ and AIJMKL, anyway, to see which is faster for your combination of problem and computing platform.
>> >
>> > Best regards,
>> > Richard
>> >
>> > On 7/17/19 8:46 PM, Karl Lin via petsc-users wrote:
>> >> We also found that if we use MatCreateSeqAIJ, then no more memory increase with matrix vector multiplication. However, with MatCreateMPIAIJMKL, the behavior is consistent.
>> >>
>> >> On Wed, Jul 17, 2019 at 5:26 PM Karl Lin <karl.linkui at gmail.com> wrote:
>> >> MatCreateMPIAIJMKL
>> >>
>> >> parallel and sequential exhibit the same behavior. In fact, we found that doing matmult will increase the memory by the size of matrix as well.
>> >>
>> >> On Wed, Jul 17, 2019 at 4:55 PM Zhang, Hong <hzhang at mcs.anl.gov> wrote:
>> >> Karl:
>> >> What matrix format do you use? Run it in parallel or sequential?
>> >> Hong
>> >>
>> >> We used /proc/self/stat to track the resident set size during program run, and we saw the resident set size jumped by the size of the matrix right after we did matmulttranspose.
>> >>
>> >> On Wed, Jul 17, 2019 at 12:04 PM hong--- via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> >> Kun:
>> >> How do you know 'MatMultTranpose creates an extra memory copy of matrix'?
>> >> Hong
>> >>
>> >> Hi,
>> >>
>> >>
>> >> I was using MatMultTranpose and MatMult to solver a linear system.
>> >>
>> >>
>> >> However we found out, MatMultTranpose create an extra memory copy of matrix for its operation. This extra memory copy is not stated everywhere in petsc manual.
>> >>
>> >>
>> >> This basically double my memory requirement to solve my system.
>> >>
>> >>
>> >> I remember mkl’s routine can do inplace matrix transpose vector product, without transposing the matrix itself.
>> >>
>> >>
>> >> Is this always the case? Or there is way to make petsc to do inplace matrix transpose vector product.
>> >>
>> >>
>> >> Any help is greatly appreciated.
>> >>
>> >>
>> >> Regards,
>> >>
>> >> Kun
>> >>
>> >>
>> >>
>> >> Schlumberger-Private
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> > -- Norbert Wiener
>> >
>> > https://www.cse.buffalo.edu/~knepley/
>>
>
More information about the petsc-users
mailing list