[petsc-users] matcreate and assembly issue

Junchao Zhang junchao.zhang at gmail.com
Thu Jul 2 23:21:06 CDT 2020


Is it because indices for the nonzeros also need memory?
--Junchao Zhang


On Thu, Jul 2, 2020 at 10:04 PM Karl Lin <karl.linkui at gmail.com> wrote:

> Hi, Matthew
>
> Thanks for the reply. However, I don't really get why additional malloc
> would double the memory footprint. If I know there is only 1GB matrix being
> loaded, there shouldn't be 2GB memory occupied even if Petsc needs to
> allocate more space.
>
> regards,
>
> Karl
>
> On Thu, Jul 2, 2020 at 8:10 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Thu, Jul 2, 2020 at 7:30 PM Karl Lin <karl.linkui at gmail.com> wrote:
>>
>>> Hi, Matt
>>>
>>> Thanks for the tip last time. We just encountered another issue with
>>> large data sets. This time the behavior is the opposite from last time. The
>>> data is 13.5TB, the total number of matrix columns is 2.4 billion. Our
>>> program crashed during matrix loading due to memory overflow in one node.
>>> As said before, we have a little memory check during loading the matrix to
>>> keep track of rss. The printout of rss in the log shows normal increase in
>>> many nodes, i.e., if we load in a portion of the matrix that is 1GB, after
>>> MatSetValues for that portion, rss will increase roughly about 1GB. On the
>>> node that has memory overflow, the rss increased by 2GB after only 1GB of
>>> matrix is loaded through MatSetValues. We are very puzzled by this. What
>>> could make the memory footprint twice as much as needed? Thanks in advance
>>> for any insight.
>>>
>>
>> The only way I can imagine this happening is that you have not
>> preallocated correctly, so that some values are causing additional mallocs.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Regards,
>>>
>>> Karl
>>>
>>> On Thu, Jun 11, 2020 at 12:00 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Thu, Jun 11, 2020 at 12:52 PM Karl Lin <karl.linkui at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi, Matthew
>>>>>
>>>>> Thanks for the suggestion, just did another run and here are some
>>>>> detailed stack traces, maybe will provide some more insight:
>>>>>  *** Process received signal ***
>>>>> Signal: Aborted (6)
>>>>> Signal code:  (-6)
>>>>> /lib64/libpthread.so.0(+0xf5f0)[0x2b56c46dc5f0]
>>>>>  [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b56c5486337]
>>>>>  [ 2] /lib64/libc.so.6(abort+0x148)[0x2b56c5487a28]
>>>>>  [ 3]
>>>>> /libpetsc.so.3.10(PetscTraceBackErrorHandler+0xc4)[0x2b56c1e6a2d4]
>>>>>  [ 4] /libpetsc.so.3.10(PetscError+0x1b5)[0x2b56c1e69f65]
>>>>>  [ 5]
>>>>> /libpetsc.so.3.10(PetscCommBuildTwoSidedFReq+0x19f0)[0x2b56c1e03cf0]
>>>>>  [ 6] /libpetsc.so.3.10(+0x77db17)[0x2b56c2425b17]
>>>>>  [ 7] /libpetsc.so.3.10(+0x77a164)[0x2b56c2422164]
>>>>>  [ 8] /libpetsc.so.3.10(MatAssemblyBegin_MPIAIJ+0x36)[0x2b56c23912b6]
>>>>>  [ 9] /libpetsc.so.3.10(MatAssemblyBegin+0xca)[0x2b56c1feccda]
>>>>>
>>>>> By reconfiguring, you mean recompiling petsc with that option, correct?
>>>>>
>>>>
>>>> Reconfiguring.
>>>>
>>>>   Thanks,
>>>>
>>>>     Matt
>>>>
>>>>
>>>>> Thank you.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Thu, Jun 11, 2020 at 10:56 AM Matthew Knepley <knepley at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Jun 11, 2020 at 11:51 AM Karl Lin <karl.linkui at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, there
>>>>>>>
>>>>>>> We have written a program using Petsc to solve large sparse matrix
>>>>>>> system. It has been working fine for a while. Recently we encountered a
>>>>>>> problem when the size of the sparse matrix is larger than 10TB. We used
>>>>>>> several hundred nodes and 2200 processes. The program always crashes during
>>>>>>> MatAssemblyBegin.Upon a closer look, there seems to be something unusual.
>>>>>>> We have a little memory check during loading the matrix to keep track of
>>>>>>> rss. The printout of rss in the log shows normal increase up to rank 2160,
>>>>>>> i.e., if we load in a portion of matrix that is 1GB, after MatSetValues for
>>>>>>> that portion, rss will increase roughly about that number. From rank 2161
>>>>>>> onwards, the rss in every rank doesn't increase after matrix loaded. Then
>>>>>>> comes MatAssemblyBegin, the program crashed on rank 2160.
>>>>>>>
>>>>>>> Is there a upper limit on the number of processes Petsc can handle?
>>>>>>> or is there a upper limit in terms of the size of the matrix petsc can
>>>>>>> handle? Thank you very much for any info.
>>>>>>>
>>>>>>
>>>>>> It sounds like you overflowed int somewhere. We try and check for
>>>>>> this, but catching every place is hard. Try reconfiguring with
>>>>>>
>>>>>>   --with-64-bit-indices
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200702/36bf37a7/attachment-0001.html>


More information about the petsc-users mailing list