<div dir="ltr"><div dir="ltr">On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry <<a href="mailto:mlohry@gmail.com">mlohry@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I'm getting seemingly random failures of late:</div><div>Caught signal number 7 BUS: Bus Error, possibly illegal memory access</div></div></blockquote><div><br></div><div>The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems</div><div>on things that run completely fine.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Symptoms:</div><div>1) Seems to only happen (so far) on larger cases, 400-2000 cores</div><div>2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics</div><div>3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node)</div><div>4) running the same setup twice it fails at different points<br></div><div><br></div><div>Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random.</div><div><br></div><div><br></div><div>Thanks,</div><div>Mark<br></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>