[petsc-users] On the edge of 2^31 unknowns

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Mon Nov 16 12:59:32 CST 2015


I looked into the code of PetscLLCondensedCreate_Scalable:

...
   ierr = PetscMalloc1(2*(nlnk_max+2),lnk);CHKERRQ(ierr);
...


and just for fun, I tried this:

#include <iostream>

int main() {
   int a=1741445953; // my number of unknowns...
   int b=2*(a+2);
   unsigned long int c = b;
   std::cout << " a: " << a <<  " b: " << b << " c: " << c <<std::endl;
   return 0;
}

and it gives:

  a: 1741445953 b: -812075386 c: 18446744072897476230


and in the PETSc error log I got this:

...
[0]PETSC ERROR: Memory requested 18446744070461249536
...

It really looks like there is a int somewhere that held the overflow, 
then have been a transformed into a unsigned long...

Thanks,

Eric

On 16/11/15 01:26 PM, Eric Chamberland wrote:
> Barry,
>
> I can't launch the code again and retrieve other informations, since I
> am not allowed to do so: the cluster have around ~780 nodes and I got a
> very special permission to reserve 530 of them...
>
> So the best I can do is to give you the backtrace PETSc gave me... :/
> (see the first post with the backtrace:
> http://lists.mcs.anl.gov/pipermail/petsc-users/2015-November/027644.html)
>
> And until today, all smaller meshes with the same solver succeeded to
> complete... (I went up to 219 millions of unknowns on 64 nodes).
>
> I understand then that there could be some use of PetscInt64 in the
> actual code that would help fix problems like the one I got.  I found it
> is a big challenge to track down all occurrence of this kind of overflow
> in the code, due to the size of the systems you have to have to
> reproduce this problem....
>
> Eric
>
>
> On 16/11/15 12:40 PM, Barry Smith wrote:
>>
>>    Eric,
>>
>>      The behavior you get with bizarre integers and a crash is not the
>> behavior we want. We would like to detect these overflows
>> appropriately.   If you can track through the error and determine the
>> location where the overflow occurs then we would gladly put in
>> additional checks and use of PetscInt64 to handle these things better.
>> So let us know the exact cause and we'll improve the code.
>>
>>    Barry
>>
>>
>>
>>> On Nov 16, 2015, at 11:11 AM, Eric Chamberland
>>> <Eric.Chamberland at giref.ulaval.ca> wrote:
>>>
>>> On 16/11/15 10:42 AM, Matthew Knepley wrote:
>>>> Sometimes when we do not have exact counts, we need to overestimate
>>>> sizes. This is especially true
>>>> in sparse MatMat.
>>>
>>> Ok... so, to be sure, I am correct if I say that recompiling petsc with
>>> "--with-64-bit-indices" is the only solution to my problem?
>>>
>>> I mean, no other fixes exist for this overestimation in a more recent
>>> release of petsc, like putting the result in a "long int" instead?
>>>
>>> Thanks,
>>>
>>> Eric
>>>



More information about the petsc-users mailing list