[petsc-users] [Ext] Re: matcreate and assembly issue

Karl Lin karl.linkui at gmail.com
Fri Jul 3 13:59:44 CDT 2020


I can certainly do that.

Now for Jed, also to clarify it more, our matrix is not stored in csr
format. It is actually stored row by row in a multi-level data structure.
There are other levels of complexity on top of that data structure. You
literally have to go through all column indices row by row, undergo indices
transformation from global to local, before the accurate count of diagonal
and off-diagonal structures can be determined for every row. So that is why
it takes a long time to read.

But now I understand the nature of the issue, I will try to explore some
options and report back to you guys. Thanks a lot.

On Fri, Jul 3, 2020 at 12:17 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Jul 3, 2020 at 12:52 PM Karl Lin <karl.linkui at gmail.com> wrote:
>
>> Hi, Matthew
>>
>> Thanks for the reply. However, if the matrix is huge, like 13.5TB in our
>> case, it will take significant amount of time to loop over insertion twice.
>> Any other time and resource saving options? Thank you very much.
>>
>
> Do you think you could do it once and time it? I would be surprised if it
> takes even 1% of your total runtime, and I would also
> like to see the timing in that we might be able to optimize something for
> you.
>
>   Thanks,
>
>      Matt
>
>
>> Regards,
>>
>> Karl
>>
>> On Fri, Jul 3, 2020 at 10:57 AM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Fri, Jul 3, 2020 at 11:38 AM Karl Lin <karl.linkui at gmail.com> wrote:
>>>
>>>> Hi, Barry
>>>>
>>>> Thanks for the explanation. Following your tip, I have a guess. We use
>>>> MatCreateAIJ to create the matrix, I believe this call will preallocate as
>>>> well. Before this call we figure out the number of nonzeros per row for all
>>>> rows and put those number in an array, say numNonZero. We pass numNonZero
>>>> as d_nnz and o_nnz to MatCreateAIJ call, so essentially we preallocate
>>>> twice as much as needed. For the process that double the memory footprint
>>>> and crashed, there are a lot of values in both the diagonal and
>>>> off-diagonal part for the process, so the temporary space gets filled up
>>>> for both diagonal and off-diagonal parts of the matrix, also there are
>>>> unused temporary space until MatAssembly, so gradually fill up the
>>>> preallocated space which doubles the memory footprint. Once MatAssembly is
>>>> done, the unused temporary space gets squeezed out, we return the correct
>>>> memory footprint of the matrix. But before MatAssembly, large amount of
>>>> unused temporary space needs to be kept because of the diagonal and
>>>> off-diagonal pattern of the input. Would you say this is a plausible
>>>> explanation? thank you.
>>>>
>>>
>>> Yes. We find that it takes a very small amount of time to just loop over
>>> the insertion twice, the first time counting the nonzeros. We built
>>> something to do this for you:
>>>
>>>
>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatPreallocatorPreallocate.html
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> Regards,
>>>>
>>>> Karl
>>>>
>>>> On Fri, Jul 3, 2020 at 9:50 AM Barry Smith <bsmith at petsc.dev> wrote:
>>>>
>>>>>
>>>>>   Karl,
>>>>>
>>>>>     If a particular process is receiving values with MatSetValues()
>>>>> that belong to a different process it needs to allocate temporary space for
>>>>> those values. If there are many values destined for a different process
>>>>> this space can be arbitrarily large. The values are not pass to the final
>>>>> owning process until the MatAssemblyBegin/End calls.
>>>>>
>>>>>     If you have not preallocated enough room the matrix actually makes
>>>>> a complete copy of itself with extra space for additional values, copies
>>>>> the values over and then deletes the old matrix this the memory use can
>>>>> double when the preallocation is not correct.
>>>>>
>>>>>
>>>>>    Barry
>>>>>
>>>>>
>>>>> On Jul 3, 2020, at 9:44 AM, Karl Lin <karl.linkui at gmail.com> wrote:
>>>>>
>>>>> Yes, I did. The memory check for rss computes the memory footprint of
>>>>> column index using size of unsigned long long instead of int.
>>>>>
>>>>> For Junchao, I wonder if keeping track of which loaded columns are
>>>>> owned by the current process and which loaded columns are not owned also
>>>>> needs some memory storage. Just a wild thought.
>>>>>
>>>>> On Thu, Jul 2, 2020 at 11:40 PM Ernesto Prudencio <EPrudencio at slb.com>
>>>>> wrote:
>>>>>
>>>>>> Karl,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> * Are you taking into account that every “integer” index might be 64
>>>>>> bits instead of 32 bits, depending on the PETSc configuration / compilation
>>>>>> choices for PetscInt? Ernesto. From: petsc-users
>>>>>> [mailto:petsc-users-bounces at mcs.anl.gov <petsc-users-bounces at mcs.anl.gov>]
>>>>>> On Behalf Of Junchao Zhang Sent: Thursday, July 2, 2020 11:21 PM To: Karl
>>>>>> Lin <karl.linkui at gmail.com <karl.linkui at gmail.com>> Cc: PETSc users list
>>>>>> <petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>> Subject: [Ext] Re:
>>>>>> [petsc-users] matcreate and assembly issue  Is it because indices for the
>>>>>> nonzeros also need memory? --Junchao Zhang     On Thu, Jul 2, 2020 at 10:04
>>>>>> PM Karl Lin <karl.linkui at gmail.com <karl.linkui at gmail.com>> wrote: Hi,
>>>>>> Matthew   Thanks for the reply. However, I don't really get why additional
>>>>>> malloc would double the memory footprint. If I know there is only 1GB
>>>>>> matrix being loaded, there shouldn't be 2GB memory occupied even if Petsc
>>>>>> needs to allocate more space.   regards,   Karl   On Thu, Jul 2, 2020 at
>>>>>> 8:10 PM Matthew Knepley <knepley at gmail.com <knepley at gmail.com>> wrote: On
>>>>>> Thu, Jul 2, 2020 at 7:30 PM Karl Lin <karl.linkui at gmail.com
>>>>>> <karl.linkui at gmail.com>> wrote: Hi, Matt   Thanks for the tip last time. We
>>>>>> just encountered another issue with large data sets. This time the behavior
>>>>>> is the opposite from last time. The data is 13.5TB, the total number of
>>>>>> matrix columns is 2.4 billion. Our program crashed during matrix loading
>>>>>> due to memory overflow in one node. As said before, we have a little memory
>>>>>> check during loading the matrix to keep track of rss. The printout of rss
>>>>>> in the log shows normal increase in many nodes, i.e., if we load in a
>>>>>> portion of the matrix that is 1GB, after MatSetValues for that portion, rss
>>>>>> will increase roughly about 1GB. On the node that has memory overflow, the
>>>>>> rss increased by 2GB after only 1GB of matrix is loaded through
>>>>>> MatSetValues. We are very puzzled by this. What could make the memory
>>>>>> footprint twice as much as needed? Thanks in advance for any insight.   The
>>>>>> only way I can imagine this happening is that you have not preallocated
>>>>>> correctly, so that some values are causing additional mallocs.     Thanks,
>>>>>>        Matt   Regards,   Karl    On Thu, Jun 11, 2020 at 12:00 PM Matthew
>>>>>> Knepley <knepley at gmail.com <knepley at gmail.com>> wrote: On Thu, Jun 11, 2020
>>>>>> at 12:52 PM Karl Lin <karl.linkui at gmail.com <karl.linkui at gmail.com>> wrote:
>>>>>> Hi, Matthew   Thanks for the suggestion, just did another run and here are
>>>>>> some detailed stack traces, maybe will provide some more insight:  ***
>>>>>> Process received signal *** Signal: Aborted (6) Signal code:  (-6)
>>>>>> /lib64/libpthread.so.0(+0xf5f0)[0x2b56c46dc5f0]  [ 1]
>>>>>> /lib64/libc.so.6(gsignal+0x37)[0x2b56c5486337]  [ 2]
>>>>>> /lib64/libc.so.6(abort+0x148)[0x2b56c5487a28]  [ 3]
>>>>>> /libpetsc.so.3.10(PetscTraceBackErrorHandler+0xc4)[0x2b56c1e6a2d4]  [ 4]
>>>>>> /libpetsc.so.3.10(PetscError+0x1b5)[0x2b56c1e69f65]  [ 5]
>>>>>> /libpetsc.so.3.10(PetscCommBuildTwoSidedFReq+0x19f0)[0x2b56c1e03cf0]  [ 6]
>>>>>> /libpetsc.so.3.10(+0x77db17)[0x2b56c2425b17]  [ 7]
>>>>>> /libpetsc.so.3.10(+0x77a164)[0x2b56c2422164]  [ 8]
>>>>>> /libpetsc.so.3.10(MatAssemblyBegin_MPIAIJ+0x36)[0x2b56c23912b6]  [ 9]
>>>>>> /libpetsc.so.3.10(MatAssemblyBegin+0xca)[0x2b56c1feccda]   By
>>>>>> reconfiguring, you mean recompiling petsc with that option, correct?
>>>>>> Reconfiguring.     Thanks,       Matt   Thank you.   Karl   On Thu, Jun 11,
>>>>>> 2020 at 10:56 AM Matthew Knepley <knepley at gmail.com <knepley at gmail.com>>
>>>>>> wrote: On Thu, Jun 11, 2020 at 11:51 AM Karl Lin <karl.linkui at gmail.com
>>>>>> <karl.linkui at gmail.com>> wrote: Hi, there   We have written a program using
>>>>>> Petsc to solve large sparse matrix system. It has been working fine for a
>>>>>> while. Recently we encountered a problem when the size of the sparse matrix
>>>>>> is larger than 10TB. We used several hundred nodes and 2200 processes. The
>>>>>> program always crashes during MatAssemblyBegin.Upon a closer look, there
>>>>>> seems to be something unusual. We have a little memory check during loading
>>>>>> the matrix to keep track of rss. The printout of rss in the log shows
>>>>>> normal increase up to rank 2160, i.e., if we load in a portion of matrix
>>>>>> that is 1GB, after MatSetValues for that portion, rss will increase roughly
>>>>>> about that number. From rank 2161 onwards, the rss in every rank doesn't
>>>>>> increase after matrix loaded. Then comes MatAssemblyBegin, the program
>>>>>> crashed on rank 2160.   Is there a upper limit on the number of processes
>>>>>> Petsc can handle? or is there a upper limit in terms of the size of the
>>>>>> matrix petsc can handle? Thank you very much for any info.   It sounds like
>>>>>> you overflowed int somewhere. We try and check for this, but catching every
>>>>>> place is hard. Try reconfiguring with     --with-64-bit-indices     Thanks,
>>>>>>        Matt   Regards,   Karl      -- What most experimenters take for
>>>>>> granted before they begin their experiments is infinitely more interesting
>>>>>> than any results to which their experiments lead. -- Norbert Wiener
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <https://urldefense.com/v3/__http:/www.cse.buffalo.edu/*knepley/__;fg!!Kjv0uj3L4nM6H-I!1KBn92fUc-8pAvJy257WTFoHD80IUf6u5iIhyL_vrliEm3psAK4KAJFCdygnPA$>
>>>>>>   -- What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead. -- Norbert Wiener   https://www.cse.buffalo.edu/~knepley/
>>>>>> <https://urldefense.com/v3/__http:/www.cse.buffalo.edu/*knepley/__;fg!!Kjv0uj3L4nM6H-I!1KBn92fUc-8pAvJy257WTFoHD80IUf6u5iIhyL_vrliEm3psAK4KAJFCdygnPA$>
>>>>>>   -- What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead. -- Norbert Wiener   https://www.cse.buffalo.edu/~knepley/
>>>>>> <https://urldefense.com/v3/__http:/www.cse.buffalo.edu/*knepley/__;fg!!Kjv0uj3L4nM6H-I!1KBn92fUc-8pAvJy257WTFoHD80IUf6u5iIhyL_vrliEm3psAK4KAJFCdygnPA$>
>>>>>> Schlumberger-Private *
>>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200703/3b8140c2/attachment-0001.html>


More information about the petsc-users mailing list