[petsc-users] Bus Error

Satish Balay balay at mcs.anl.gov
Mon Aug 24 12:40:05 CDT 2020


On Mon, 24 Aug 2020, Barry Smith wrote:

> 
> 
> > On Aug 24, 2020, at 12:31 PM, Jed Brown <jed at jedbrown.org> wrote:
> > 
> > Barry Smith <bsmith at petsc.dev> writes:
> > 
> >>  So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing?
> > 
> > I would suspect memory corruption.
> 
> 
>   Corruption meaning what specifically?
> 
>   The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors. 
> 
>   So then it can only be corruption of the pointers passed in, correct?

My wild guess here is - some hardware is misbehaving [on severe
load/overheating/insufficient-coolring]. Some errors should be
detected/corrected by ECC RAM - but perhaps not all failures get
detected?

Satish


More information about the petsc-users mailing list