[petsc-users] Bus Error

Mark Lohry mlohry at gmail.com
Wed Aug 12 06:52:22 CDT 2020


I'm getting seemingly random failures of late:
Caught signal number 7 BUS: Bus Error, possibly illegal memory access

Symptoms:
1) Seems to only happen (so far) on larger cases, 400-2000 cores
2) It doesn't happen right away -- this was running happily for several
hours over several hundred time steps with no indication of bad health in
the numerics
3) At least the total memory consumption seems to be within bounds, though
I'm not sure about individual processes. e.g. slurm here reported Memory
Efficiency: 75.23% of 1.76 TB (180.00 GB/node)
4) running the same setup twice it fails at different points

Any suggestions on what to look for? This is a bit painful to work on as I
can only reproduce it on large runs and then it's seemingly random.


Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200812/5ab58641/attachment.html>


More information about the petsc-users mailing list