[petsc-users] matcreate and assembly issue

Karl Lin karl.linkui at gmail.com
Thu Jul 2 22:03:26 CDT 2020


Hi, Matthew

Thanks for the reply. However, I don't really get why additional malloc
would double the memory footprint. If I know there is only 1GB matrix being
loaded, there shouldn't be 2GB memory occupied even if Petsc needs to
allocate more space.

regards,

Karl

On Thu, Jul 2, 2020 at 8:10 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Jul 2, 2020 at 7:30 PM Karl Lin <karl.linkui at gmail.com> wrote:
>
>> Hi, Matt
>>
>> Thanks for the tip last time. We just encountered another issue with
>> large data sets. This time the behavior is the opposite from last time. The
>> data is 13.5TB, the total number of matrix columns is 2.4 billion. Our
>> program crashed during matrix loading due to memory overflow in one node.
>> As said before, we have a little memory check during loading the matrix to
>> keep track of rss. The printout of rss in the log shows normal increase in
>> many nodes, i.e., if we load in a portion of the matrix that is 1GB, after
>> MatSetValues for that portion, rss will increase roughly about 1GB. On the
>> node that has memory overflow, the rss increased by 2GB after only 1GB of
>> matrix is loaded through MatSetValues. We are very puzzled by this. What
>> could make the memory footprint twice as much as needed? Thanks in advance
>> for any insight.
>>
>
> The only way I can imagine this happening is that you have not
> preallocated correctly, so that some values are causing additional mallocs.
>
>   Thanks,
>
>      Matt
>
>
>> Regards,
>>
>> Karl
>>
>> On Thu, Jun 11, 2020 at 12:00 PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Thu, Jun 11, 2020 at 12:52 PM Karl Lin <karl.linkui at gmail.com> wrote:
>>>
>>>> Hi, Matthew
>>>>
>>>> Thanks for the suggestion, just did another run and here are some
>>>> detailed stack traces, maybe will provide some more insight:
>>>>  *** Process received signal ***
>>>> Signal: Aborted (6)
>>>> Signal code:  (-6)
>>>> /lib64/libpthread.so.0(+0xf5f0)[0x2b56c46dc5f0]
>>>>  [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b56c5486337]
>>>>  [ 2] /lib64/libc.so.6(abort+0x148)[0x2b56c5487a28]
>>>>  [ 3] /libpetsc.so.3.10(PetscTraceBackErrorHandler+0xc4)[0x2b56c1e6a2d4]
>>>>  [ 4] /libpetsc.so.3.10(PetscError+0x1b5)[0x2b56c1e69f65]
>>>>  [ 5]
>>>> /libpetsc.so.3.10(PetscCommBuildTwoSidedFReq+0x19f0)[0x2b56c1e03cf0]
>>>>  [ 6] /libpetsc.so.3.10(+0x77db17)[0x2b56c2425b17]
>>>>  [ 7] /libpetsc.so.3.10(+0x77a164)[0x2b56c2422164]
>>>>  [ 8] /libpetsc.so.3.10(MatAssemblyBegin_MPIAIJ+0x36)[0x2b56c23912b6]
>>>>  [ 9] /libpetsc.so.3.10(MatAssemblyBegin+0xca)[0x2b56c1feccda]
>>>>
>>>> By reconfiguring, you mean recompiling petsc with that option, correct?
>>>>
>>>
>>> Reconfiguring.
>>>
>>>   Thanks,
>>>
>>>     Matt
>>>
>>>
>>>> Thank you.
>>>>
>>>> Karl
>>>>
>>>> On Thu, Jun 11, 2020 at 10:56 AM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Jun 11, 2020 at 11:51 AM Karl Lin <karl.linkui at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, there
>>>>>>
>>>>>> We have written a program using Petsc to solve large sparse matrix
>>>>>> system. It has been working fine for a while. Recently we encountered a
>>>>>> problem when the size of the sparse matrix is larger than 10TB. We used
>>>>>> several hundred nodes and 2200 processes. The program always crashes during
>>>>>> MatAssemblyBegin.Upon a closer look, there seems to be something unusual.
>>>>>> We have a little memory check during loading the matrix to keep track of
>>>>>> rss. The printout of rss in the log shows normal increase up to rank 2160,
>>>>>> i.e., if we load in a portion of matrix that is 1GB, after MatSetValues for
>>>>>> that portion, rss will increase roughly about that number. From rank 2161
>>>>>> onwards, the rss in every rank doesn't increase after matrix loaded. Then
>>>>>> comes MatAssemblyBegin, the program crashed on rank 2160.
>>>>>>
>>>>>> Is there a upper limit on the number of processes Petsc can handle?
>>>>>> or is there a upper limit in terms of the size of the matrix petsc can
>>>>>> handle? Thank you very much for any info.
>>>>>>
>>>>>
>>>>> It sounds like you overflowed int somewhere. We try and check for
>>>>> this, but catching every place is hard. Try reconfiguring with
>>>>>
>>>>>   --with-64-bit-indices
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>      Matt
>>>>>
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200702/476404a1/attachment.html>


More information about the petsc-users mailing list