[petsc-users] [Ext] Re: matcreate and assembly issue

Karl Lin karl.linkui at gmail.com
Fri Jul 3 09:44:05 CDT 2020


Yes, I did. The memory check for rss computes the memory footprint of
column index using size of unsigned long long instead of int.

For Junchao, I wonder if keeping track of which loaded columns are owned by
the current process and which loaded columns are not owned also needs some
memory storage. Just a wild thought.

On Thu, Jul 2, 2020 at 11:40 PM Ernesto Prudencio <EPrudencio at slb.com>
wrote:

> Karl,
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *   Are you taking into account that every “integer” index might be 64
> bits instead of 32 bits, depending on the PETSc configuration / compilation
> choices for PetscInt?   Ernesto.   From: petsc-users
> [mailto:petsc-users-bounces at mcs.anl.gov <petsc-users-bounces at mcs.anl.gov>]
> On Behalf Of Junchao Zhang Sent: Thursday, July 2, 2020 11:21 PM To: Karl
> Lin <karl.linkui at gmail.com <karl.linkui at gmail.com>> Cc: PETSc users list
> <petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>> Subject: [Ext] Re:
> [petsc-users] matcreate and assembly issue   Is it because indices for the
> nonzeros also need memory? --Junchao Zhang     On Thu, Jul 2, 2020 at 10:04
> PM Karl Lin <karl.linkui at gmail.com <karl.linkui at gmail.com>> wrote: Hi,
> Matthew   Thanks for the reply. However, I don't really get why additional
> malloc would double the memory footprint. If I know there is only 1GB
> matrix being loaded, there shouldn't be 2GB memory occupied even if Petsc
> needs to allocate more space.   regards,   Karl   On Thu, Jul 2, 2020 at
> 8:10 PM Matthew Knepley <knepley at gmail.com <knepley at gmail.com>> wrote: On
> Thu, Jul 2, 2020 at 7:30 PM Karl Lin <karl.linkui at gmail.com
> <karl.linkui at gmail.com>> wrote: Hi, Matt   Thanks for the tip last time. We
> just encountered another issue with large data sets. This time the behavior
> is the opposite from last time. The data is 13.5TB, the total number of
> matrix columns is 2.4 billion. Our program crashed during matrix loading
> due to memory overflow in one node. As said before, we have a little memory
> check during loading the matrix to keep track of rss. The printout of rss
> in the log shows normal increase in many nodes, i.e., if we load in a
> portion of the matrix that is 1GB, after MatSetValues for that portion, rss
> will increase roughly about 1GB. On the node that has memory overflow, the
> rss increased by 2GB after only 1GB of matrix is loaded through
> MatSetValues. We are very puzzled by this. What could make the memory
> footprint twice as much as needed? Thanks in advance for any insight.   The
> only way I can imagine this happening is that you have not preallocated
> correctly, so that some values are causing additional mallocs.     Thanks,
>        Matt   Regards,   Karl    On Thu, Jun 11, 2020 at 12:00 PM Matthew
> Knepley <knepley at gmail.com <knepley at gmail.com>> wrote: On Thu, Jun 11, 2020
> at 12:52 PM Karl Lin <karl.linkui at gmail.com <karl.linkui at gmail.com>> wrote:
> Hi, Matthew   Thanks for the suggestion, just did another run and here are
> some detailed stack traces, maybe will provide some more insight:  ***
> Process received signal *** Signal: Aborted (6) Signal code:  (-6)
> /lib64/libpthread.so.0(+0xf5f0)[0x2b56c46dc5f0]  [ 1]
> /lib64/libc.so.6(gsignal+0x37)[0x2b56c5486337]  [ 2]
> /lib64/libc.so.6(abort+0x148)[0x2b56c5487a28]  [ 3]
> /libpetsc.so.3.10(PetscTraceBackErrorHandler+0xc4)[0x2b56c1e6a2d4]  [ 4]
> /libpetsc.so.3.10(PetscError+0x1b5)[0x2b56c1e69f65]  [ 5]
> /libpetsc.so.3.10(PetscCommBuildTwoSidedFReq+0x19f0)[0x2b56c1e03cf0]  [ 6]
> /libpetsc.so.3.10(+0x77db17)[0x2b56c2425b17]  [ 7]
> /libpetsc.so.3.10(+0x77a164)[0x2b56c2422164]  [ 8]
> /libpetsc.so.3.10(MatAssemblyBegin_MPIAIJ+0x36)[0x2b56c23912b6]  [ 9]
> /libpetsc.so.3.10(MatAssemblyBegin+0xca)[0x2b56c1feccda]   By
> reconfiguring, you mean recompiling petsc with that option, correct?
> Reconfiguring.     Thanks,       Matt   Thank you.   Karl   On Thu, Jun 11,
> 2020 at 10:56 AM Matthew Knepley <knepley at gmail.com <knepley at gmail.com>>
> wrote: On Thu, Jun 11, 2020 at 11:51 AM Karl Lin <karl.linkui at gmail.com
> <karl.linkui at gmail.com>> wrote: Hi, there   We have written a program using
> Petsc to solve large sparse matrix system. It has been working fine for a
> while. Recently we encountered a problem when the size of the sparse matrix
> is larger than 10TB. We used several hundred nodes and 2200 processes. The
> program always crashes during MatAssemblyBegin.Upon a closer look, there
> seems to be something unusual. We have a little memory check during loading
> the matrix to keep track of rss. The printout of rss in the log shows
> normal increase up to rank 2160, i.e., if we load in a portion of matrix
> that is 1GB, after MatSetValues for that portion, rss will increase roughly
> about that number. From rank 2161 onwards, the rss in every rank doesn't
> increase after matrix loaded. Then comes MatAssemblyBegin, the program
> crashed on rank 2160.   Is there a upper limit on the number of processes
> Petsc can handle? or is there a upper limit in terms of the size of the
> matrix petsc can handle? Thank you very much for any info.   It sounds like
> you overflowed int somewhere. We try and check for this, but catching every
> place is hard. Try reconfiguring with     --with-64-bit-indices     Thanks,
>        Matt   Regards,   Karl      -- What most experimenters take for
> granted before they begin their experiments is infinitely more interesting
> than any results to which their experiments lead. -- Norbert Wiener
> https://www.cse.buffalo.edu/~knepley/
> <https://urldefense.com/v3/__http:/www.cse.buffalo.edu/*knepley/__;fg!!Kjv0uj3L4nM6H-I!1KBn92fUc-8pAvJy257WTFoHD80IUf6u5iIhyL_vrliEm3psAK4KAJFCdygnPA$>
>   -- What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead. -- Norbert Wiener   https://www.cse.buffalo.edu/~knepley/
> <https://urldefense.com/v3/__http:/www.cse.buffalo.edu/*knepley/__;fg!!Kjv0uj3L4nM6H-I!1KBn92fUc-8pAvJy257WTFoHD80IUf6u5iIhyL_vrliEm3psAK4KAJFCdygnPA$>
>   -- What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead. -- Norbert Wiener   https://www.cse.buffalo.edu/~knepley/
> <https://urldefense.com/v3/__http:/www.cse.buffalo.edu/*knepley/__;fg!!Kjv0uj3L4nM6H-I!1KBn92fUc-8pAvJy257WTFoHD80IUf6u5iIhyL_vrliEm3psAK4KAJFCdygnPA$>
>   Schlumberger-Private *
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200703/0ab18eba/attachment.html>


More information about the petsc-users mailing list