<div dir="ltr">After recompiling with 64bit option, the program ran successfully. Thank you very much for the insight.<br></div><br><div class="gmail_quote"><div class="gmail_attr" dir="ltr">On Thu, Jun 11, 2020 at 12:00 PM Satish Balay <<a href="mailto:balay@mcs.anl.gov">balay@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">On Thu, 11 Jun 2020, Karl Lin wrote:<br>
<br>
> Hi, Matthew<br>
> <br>
> Thanks for the suggestion, just did another run and here are some detailed<br>
> stack traces, maybe will provide some more insight:<br>
> *** Process received signal ***<br>
> Signal: Aborted (6)<br>
> Signal code: (-6)<br>
> /lib64/libpthread.so.0(+0xf5f0)[0x2b56c46dc5f0]<br>
> [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b56c5486337]<br>
> [ 2] /lib64/libc.so.6(abort+0x148)[0x2b56c5487a28]<br>
> [ 3] /libpetsc.so.3.10(PetscTraceBackErrorHandler+0xc4)[0x2b56c1e6a2d4]<br>
> [ 4] /libpetsc.so.3.10(PetscError+0x1b5)[0x2b56c1e69f65]<br>
> [ 5] /libpetsc.so.3.10(PetscCommBuildTwoSidedFReq+0x19f0)[0x2b56c1e03cf0]<br>
> [ 6] /libpetsc.so.3.10(+0x77db17)[0x2b56c2425b17]<br>
> [ 7] /libpetsc.so.3.10(+0x77a164)[0x2b56c2422164]<br>
> [ 8] /libpetsc.so.3.10(MatAssemblyBegin_MPIAIJ+0x36)[0x2b56c23912b6]<br>
> [ 9] /libpetsc.so.3.10(MatAssemblyBegin+0xca)[0x2b56c1feccda]<br>
> <br>
> By reconfiguring, you mean recompiling petsc with that option, correct?<br>
<br>
yes. you can use a different PETSC_ARCH for this build - so that both builds are usable [by just switching PETSC_ARCH from your appliation makefile]<br>
<br>
Satish<br>
<br>
> <br>
> Thank you.<br>
> <br>
> Karl<br>
> <br>
> On Thu, Jun 11, 2020 at 10:56 AM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>
> <br>
> > On Thu, Jun 11, 2020 at 11:51 AM Karl Lin <<a href="mailto:karl.linkui@gmail.com" target="_blank">karl.linkui@gmail.com</a>> wrote:<br>
> ><br>
> >> Hi, there<br>
> >><br>
> >> We have written a program using Petsc to solve large sparse matrix<br>
> >> system. It has been working fine for a while. Recently we encountered a<br>
> >> problem when the size of the sparse matrix is larger than 10TB. We used<br>
> >> several hundred nodes and 2200 processes. The program always crashes during<br>
> >> MatAssemblyBegin.Upon a closer look, there seems to be something unusual.<br>
> >> We have a little memory check during loading the matrix to keep track of<br>
> >> rss. The printout of rss in the log shows normal increase up to rank 2160,<br>
> >> i.e., if we load in a portion of matrix that is 1GB, after MatSetValues for<br>
> >> that portion, rss will increase roughly about that number. From rank 2161<br>
> >> onwards, the rss in every rank doesn't increase after matrix loaded. Then<br>
> >> comes MatAssemblyBegin, the program crashed on rank 2160.<br>
> >><br>
> >> Is there a upper limit on the number of processes Petsc can handle? or is<br>
> >> there a upper limit in terms of the size of the matrix petsc can handle?<br>
> >> Thank you very much for any info.<br>
> >><br>
> ><br>
> > It sounds like you overflowed int somewhere. We try and check for this,<br>
> > but catching every place is hard. Try reconfiguring with<br>
> ><br>
> > --with-64-bit-indices<br>
> ><br>
> > Thanks,<br>
> ><br>
> > Matt<br>
> ><br>
> ><br>
> >> Regards,<br>
> >><br>
> >> Karl<br>
> >><br>
> ><br>
> ><br>
> > --<br>
> > What most experimenters take for granted before they begin their<br>
> > experiments is infinitely more interesting than any results to which their<br>
> > experiments lead.<br>
> > -- Norbert Wiener<br>
> ><br>
> > <a href="https://www.cse.buffalo.edu/~knepley/" target="_blank" rel="noreferrer">https://www.cse.buffalo.edu/~knepley/</a><br>
> > <<a href="http://www.cse.buffalo.edu/~knepley/" target="_blank" rel="noreferrer">http://www.cse.buffalo.edu/~knepley/</a>><br>
> ><br>
> <br>
<br>
</blockquote></div>