[petsc-users] MatMultTranspose memory usage
Karl Lin
karl.linkui at gmail.com
Tue Jul 30 17:27:23 CDT 2019
number of rows is 26326575. maximum column index is 36416250. number of
nonzero coefficients is 6653750976, which amounts to 49.7GB for
coefficients in PetscScalar and column index in PetscInt. I can run the
program in 4 processes with this input but not single process. Here are the
snap shots of the error:
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Out of memory. This could be due to allocating
[0]PETSC ERROR: too large an object or bleeding by not properly
[0]PETSC ERROR: destroying unneeded objects.
[0]PETSC ERROR: Memory allocated 0 Memory used by process 727035904
[0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
[0]PETSC ERROR: Memory requested 18446744058325389312
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.4, Feb, 26, 2019
[0]PETSC ERROR: #1 MatSeqAIJSetPreallocation_SeqAIJ() line 3711 in
/petsc-3.10.4/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #2 PetscMallocA() line 390 in
/petsc-3.10.4/src/sys/memory/mal.c
[0]PETSC ERROR: #3 MatSeqAIJSetPreallocation_SeqAIJ() line 3711
in/petsc-3.10.4/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #4 MatSeqAIJSetPreallocation() line 3649 in
/petsc-3.10.4/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #5 MatCreateAIJ() line 4413 in
/petsc-3.10.4/src/mat/impls/aij/mpi/mpiaij.c
[0]PETSC ERROR: #6 *** (my code)
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.4, Feb, 26, 2019
On Tue, Jul 30, 2019 at 10:34 AM Matthew Knepley <knepley at gmail.com> wrote:
> On Wed, Jul 31, 2019 at 3:25 AM Karl Lin via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Hi, Richard,
>>
>> We have a new question. Is there a limit for MatCreateMPIAIJ and
>> MatSetValues? What I mean is that, we tried to create a sparse matrix and
>> populate it with 50GB of data in one process, I got a crash and error
>> saying object too big. Thank you for any insight.
>>
>
> 1) Always send the complete error.
>
> 2) It sounds like you got an out of memory error for that process.
>
> Matt
>
>
>> Regards,
>>
>> Karl
>>
>> On Thu, Jul 18, 2019 at 2:36 PM Mills, Richard Tran via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>> Hi Kun and Karl,
>>>
>>> If you are using the AIJMKL matrix types and have a recent version of
>>> MKL, the AIJMKL code uses MKL's inspector-executor sparse BLAS routines,
>>> which are described at
>>>
>>>
>>> https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
>>>
>>> The inspector-executor analysis routines take the AIJ (compressed sparse
>>> row) format data from PETSc and then create a copy in an optimized,
>>> internal layout used by MKL. We have to keep PETSc's own, AIJ
>>> representation around, as it is needed for several operations that MKL does
>>> not provide. This does, unfortunately, mean that roughly double (or more,
>>> depending on what MKL decides to do) the amount of memory is required. The
>>> reason you see the memory usage increase right when a MatMult() or
>>> MatMultTranspose() operation occurs is that the we default to a "lazy"
>>> approach to calling the analysis routine (mkl_sparse_optimize()) until an
>>> operation that uses an MKL-provided kernel is requested. (You can use an
>>> "eager" approach that calls mkl_sparse_optimize() during MatAssemblyEnd()
>>> by specifying "-mat_aijmkl_eager_inspection" in the PETSc options.)
>>>
>>> If memory is at enough of a premium for you that you can't afford the
>>> extra copy used by the MKL inspector-executor routines, then I suggest
>>> using the usual PETSc AIJ format instead of AIJMKL. AIJ is fairly well
>>> optimized for many cases (and even has some hand-optimized kernels using
>>> Intel AVX/AVX2/AVX-512 intrinsics) and often outperforms AIJMKL. You should
>>> try both AIJ and AIJMKL, anyway, to see which is faster for your
>>> combination of problem and computing platform.
>>>
>>> Best regards,
>>> Richard
>>>
>>> On 7/17/19 8:46 PM, Karl Lin via petsc-users wrote:
>>>
>>> We also found that if we use MatCreateSeqAIJ, then no more memory
>>> increase with matrix vector multiplication. However, with
>>> MatCreateMPIAIJMKL, the behavior is consistent.
>>>
>>> On Wed, Jul 17, 2019 at 5:26 PM Karl Lin <karl.linkui at gmail.com> wrote:
>>>
>>>> MatCreateMPIAIJMKL
>>>>
>>>> parallel and sequential exhibit the same behavior. In fact, we found
>>>> that doing matmult will increase the memory by the size of matrix as well.
>>>>
>>>> On Wed, Jul 17, 2019 at 4:55 PM Zhang, Hong <hzhang at mcs.anl.gov> wrote:
>>>>
>>>>> Karl:
>>>>> What matrix format do you use? Run it in parallel or sequential?
>>>>> Hong
>>>>>
>>>>> We used /proc/self/stat to track the resident set size during program
>>>>>> run, and we saw the resident set size jumped by the size of the matrix
>>>>>> right after we did matmulttranspose.
>>>>>>
>>>>>> On Wed, Jul 17, 2019 at 12:04 PM hong--- via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> Kun:
>>>>>>> How do you know 'MatMultTranpose creates an extra memory copy of
>>>>>>> matrix'?
>>>>>>> Hong
>>>>>>>
>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I was using MatMultTranpose and MatMult to solver a linear system.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> However we found out, MatMultTranpose create an extra memory copy
>>>>>>>> of matrix for its operation. This extra memory copy is not stated
>>>>>>>> everywhere in petsc manual.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This basically double my memory requirement to solve my system.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I remember mkl’s routine can do inplace matrix transpose vector
>>>>>>>> product, without transposing the matrix itself.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Is this always the case? Or there is way to make petsc to do
>>>>>>>> inplace matrix transpose vector product.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Any help is greatly appreciated.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Kun
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Schlumberger-Private
>>>>>>>>
>>>>>>>
>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190730/3eec0a44/attachment.html>
More information about the petsc-users
mailing list