[petsc-users] Bus Error

Barry Smith bsmith at petsc.dev
Mon Aug 24 15:21:17 CDT 2020



> On Aug 24, 2020, at 2:34 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> I'm thinking of something such as writing floating point data into the return address, which would be unaligned/garbage.

  Ok, my patch will detect this. This is what I was talking about, messing up the BLAS arguments which are the addresses of arrays.

  Valgrind is by far the preferred approach.

  Barry

  Another feature we could add to the malloc checking is when a SEGV or BUS error is encountered and we catch it we should run the PetscMallocVerify() and check our memory for corruption reporting any we find.



> 
> Reproducing under Valgrind would help a lot.  Perhaps it's possible to checkpoint such that the breakage can be reproduced more quickly?
> 
> Barry Smith <bsmith at petsc.dev> writes:
> 
>> https://en.wikipedia.org/wiki/Bus_error <https://en.wikipedia.org/wiki/Bus_error>
>> 
>> But perhaps not true for Intel? 
>> 
>> 
>> 
>>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>> 
>>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>> 
>>> 
>>>> On Aug 24, 2020, at 12:39 PM, Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>>>> 
>>>> Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> writes:
>>>> 
>>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>>>>>> 
>>>>>> Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> writes:
>>>>>> 
>>>>>>> So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing?
>>>>>> 
>>>>>> I would suspect memory corruption.
>>>>> 
>>>>> 
>>>>> Corruption meaning what specifically?
>>>>> 
>>>>> The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors. 
>>>>> 
>>>>> So then it can only be corruption of the pointers passed in, correct?
>>>> 
>>>> Such as those pointers pointing into data on the stack with incorrect sizes.
>>> 
>>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS?
>>> 
>>> My understanding was that roughly memory errors in the heap are SEGV and memory errors on the stack are SIGBUS. Is that not true?
>>> 
>>>   Matt
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>



More information about the petsc-users mailing list