[petsc-users] Bus Error

Matthew Knepley knepley at gmail.com
Wed Aug 12 07:46:06 CDT 2020


On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry <mlohry at gmail.com> wrote:

> I'm getting seemingly random failures of late:
> Caught signal number 7 BUS: Bus Error, possibly illegal memory access
>

The first thing I would do is run valgrind on as wide an array of tests as
you can. This will find problems
on things that run completely fine.

  Thanks,

     Matt


> Symptoms:
> 1) Seems to only happen (so far) on larger cases, 400-2000 cores
> 2) It doesn't happen right away -- this was running happily for several
> hours over several hundred time steps with no indication of bad health in
> the numerics
> 3) At least the total memory consumption seems to be within bounds, though
> I'm not sure about individual processes. e.g. slurm here reported Memory
> Efficiency: 75.23% of 1.76 TB (180.00 GB/node)
> 4) running the same setup twice it fails at different points
>
> Any suggestions on what to look for? This is a bit painful to work on as I
> can only reproduce it on large runs and then it's seemingly random.
>
>
> Thanks,
> Mark
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200812/5f1f4b4f/attachment.html>


More information about the petsc-users mailing list