<div dir="ltr"><div>Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: <a href="https://github.com/boostorg/serialization/issues/104">https://github.com/boostorg/serialization/issues/104</a>  although I don't think this was causing the issue).</div><div><br></div><div>-malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts?<br></div><div><br></div><div><br></div><div><br></div><div>[ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c<br>[ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c<br>[ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c<br>[ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c<br>[ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c<br>[ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c<br>[ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c<br>[ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c<br>[ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c<br>[ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c<br>[ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c<br>[ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c<br>[ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c<br>[ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c<br>[ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c<br>[ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c<br>[ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c<br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 12, 2020 at 1:46 PM Barry Smith <<a href="mailto:bsmith@petsc.dev">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><br></div>   Mark.<div><br></div><div>    When valgrind is not feasible (like on many centrally controlled batch systems) you can run PETSc with an extra flag to do some memory error checks</div><div> -malloc_debug</div><div><br></div><div> this </div><div><br></div><div>1) fills all malloced memory with Nan so if the code is using uninitialized memory it may be detected and </div><div>2) checks the beginning and end of each alloced memory region for out-of-bounds writes at each malloc and free.</div><div><br></div><div>it will slow the code down a little bit but generally not a huge amount.</div><div><br></div><div>It is no where near as good as valgrind or other memory corruption tools but it has the advantage you can run it anywhere on any size job.</div><div><br></div><div><br></div><div>  Barry</div><div><br></div><div><br></div><div><br></div><div><br><div><br><blockquote type="cite"><div>On Aug 12, 2020, at 7:46 AM, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:</div><br><div><div dir="ltr"><div dir="ltr">On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry <<a href="mailto:mlohry@gmail.com" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I'm getting seemingly random failures of late:</div><div>Caught signal number 7 BUS: Bus Error, possibly illegal memory access</div></div></blockquote><div><br></div><div>The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems</div><div>on things that run completely fine.</div><div><br></div><div>  Thanks,</div><div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Symptoms:</div><div>1) Seems to only happen (so far) on larger cases, 400-2000 cores</div><div>2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics</div><div>3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node)</div><div>4) running the same setup twice it fails at different points<br></div><div><br></div><div>Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random.</div><div><br></div><div><br></div><div>Thanks,</div><div>Mark<br></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</div></blockquote></div><br></div></div></blockquote></div>