[petsc-users] Bus Error
Satish Balay
balay at mcs.anl.gov
Mon Aug 24 12:40:05 CDT 2020
On Mon, 24 Aug 2020, Barry Smith wrote:
>
>
> > On Aug 24, 2020, at 12:31 PM, Jed Brown <jed at jedbrown.org> wrote:
> >
> > Barry Smith <bsmith at petsc.dev> writes:
> >
> >> So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing?
> >
> > I would suspect memory corruption.
>
>
> Corruption meaning what specifically?
>
> The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors.
>
> So then it can only be corruption of the pointers passed in, correct?
My wild guess here is - some hardware is misbehaving [on severe
load/overheating/insufficient-coolring]. Some errors should be
detected/corrected by ECC RAM - but perhaps not all failures get
detected?
Satish
More information about the petsc-users
mailing list