[petsc-users] On the edge of 2^31 unknowns
Eric Chamberland
Eric.Chamberland at giref.ulaval.ca
Mon Nov 16 12:59:32 CST 2015
I looked into the code of PetscLLCondensedCreate_Scalable:
...
ierr = PetscMalloc1(2*(nlnk_max+2),lnk);CHKERRQ(ierr);
...
and just for fun, I tried this:
#include <iostream>
int main() {
int a=1741445953; // my number of unknowns...
int b=2*(a+2);
unsigned long int c = b;
std::cout << " a: " << a << " b: " << b << " c: " << c <<std::endl;
return 0;
}
and it gives:
a: 1741445953 b: -812075386 c: 18446744072897476230
and in the PETSc error log I got this:
...
[0]PETSC ERROR: Memory requested 18446744070461249536
...
It really looks like there is a int somewhere that held the overflow,
then have been a transformed into a unsigned long...
Thanks,
Eric
On 16/11/15 01:26 PM, Eric Chamberland wrote:
> Barry,
>
> I can't launch the code again and retrieve other informations, since I
> am not allowed to do so: the cluster have around ~780 nodes and I got a
> very special permission to reserve 530 of them...
>
> So the best I can do is to give you the backtrace PETSc gave me... :/
> (see the first post with the backtrace:
> http://lists.mcs.anl.gov/pipermail/petsc-users/2015-November/027644.html)
>
> And until today, all smaller meshes with the same solver succeeded to
> complete... (I went up to 219 millions of unknowns on 64 nodes).
>
> I understand then that there could be some use of PetscInt64 in the
> actual code that would help fix problems like the one I got. I found it
> is a big challenge to track down all occurrence of this kind of overflow
> in the code, due to the size of the systems you have to have to
> reproduce this problem....
>
> Eric
>
>
> On 16/11/15 12:40 PM, Barry Smith wrote:
>>
>> Eric,
>>
>> The behavior you get with bizarre integers and a crash is not the
>> behavior we want. We would like to detect these overflows
>> appropriately. If you can track through the error and determine the
>> location where the overflow occurs then we would gladly put in
>> additional checks and use of PetscInt64 to handle these things better.
>> So let us know the exact cause and we'll improve the code.
>>
>> Barry
>>
>>
>>
>>> On Nov 16, 2015, at 11:11 AM, Eric Chamberland
>>> <Eric.Chamberland at giref.ulaval.ca> wrote:
>>>
>>> On 16/11/15 10:42 AM, Matthew Knepley wrote:
>>>> Sometimes when we do not have exact counts, we need to overestimate
>>>> sizes. This is especially true
>>>> in sparse MatMat.
>>>
>>> Ok... so, to be sure, I am correct if I say that recompiling petsc with
>>> "--with-64-bit-indices" is the only solution to my problem?
>>>
>>> I mean, no other fixes exist for this overestimation in a more recent
>>> release of petsc, like putting the result in a "long int" instead?
>>>
>>> Thanks,
>>>
>>> Eric
>>>
More information about the petsc-users
mailing list