[petsc-users] Bus Error
Barry Smith
bsmith at petsc.dev
Mon Aug 24 12:40:20 CDT 2020
Mark,
I have attached a patch file you can apply to PETSc with
patch -p1 < blascheck.patch
then build the debug version of PETSc and run your crashing problem.
Then checks all the input double precision arrays passed to BLAS that are crashing in your code for every call. If the pointer is not usable as a double precision pointer it will error and print the argument number of the BLAS call and the stack.
This may give us a bit more information about the problem than before. For example if there is memory corruption that changes one of the pointers used in the BLAS we will now know which one.
Barry
> On Aug 24, 2020, at 10:15 AM, Mark Lohry <mlohry at gmail.com> wrote:
>
> Do you ever use regular malloc()? PETSc malloc aligns automatically, but the system one does not.
>
> Indirectly via new, yes.
>
> On Mon, Aug 24, 2020 at 11:10 AM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
> On Mon, Aug 24, 2020 at 10:56 AM Mark Lohry <mlohry at gmail.com <mailto:mlohry at gmail.com>> wrote:
> Thanks Barry, I'll give -malloc_debug a shot.
>
> I know this is not necessarily a reasonable test but if you run the exact same thing twice does it crash at the same location in terms of iterations or does it seem to crash eventually "randomly" just after a long time?
>
> Crashes after a different number of iterations, seemingly random.
>
>
> I understand the frustration with this kind of crash, it just shouldn't happen because the same BLAS calls have been made in the same way thousands of times and yet suddenly trouble and very hard to debug.
>
> Eventually makes for a good war story.
>
> Thinking back, I have seen some disturbing memory behavior that I think falls back to my use of eigen... e.g. in the past when running my full test suite a particular case would fail with NaNs, but if I ran that case in isolation it passes. I wonder if some object isn't getting properly aligned and at some point some kind of corruption occurs?
>
> Do you ever use regular malloc()? PETSc malloc aligns automatically, but the system one does not.
>
> Thanks,
>
> Matt
>
> On Mon, Aug 24, 2020 at 10:35 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>
> Mark,
>
> Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS.
>
> Since valgrind is not viable have you tried with -malloc_debug with the bad case it will be a little bit slower but not to bad and can find some memory corruption issues.
>
> It might be useful to get the stack trace inside the BLAS to see exactly where it crashes. If you ./configure with debugging and use --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS with debugging, but just running a batch job still won't display the stack frames inside the BLAS call.
>
> We have an option -on_error_attach_debugger which is useful for longer many rank runs that attaches the debugger ONLY when the error is detected but it may not play well with batch systems. But if you can make your run on a non-batch system it might be able, along with the --download-fblaslapack or --download-f2cblaslapack to get the exact stack frames. And in the debugger look at the variables and address points to try to determine how it could have gone wrong.
>
> I know this is not necessarily a reasonable test but if you run the exact same thing twice does it crash at the same location in terms of iterations or does it seem to crash eventually "randomly" just after a long time?
>
> I understand the frustration with this kind of crash, it just shouldn't happen because the same BLAS calls have been made in the same way thousands of times and yet suddenly trouble and very hard to debug.
>
> Barry
>
>
>
>
>> On Aug 24, 2020, at 9:15 AM, Mark Lohry <mlohry at gmail.com <mailto:mlohry at gmail.com>> wrote:
>>
>> valgrind: I ran a much smaller case and didn't see any issues in valgrind. I'm only seeing this bus error on several hundred cores a few hours wallclock in, so it might not be feasible to run that in valgrind.
>>
>> blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red hat derivative), libblas.so.3.4.2.. i'm going to try with intel and if that fails use the openblas downloaded via petsc and see if it alleviates itself.
>>
>>
>>
>> On Mon, Aug 24, 2020 at 9:48 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>
>> Mark,
>>
>> Can you run in valgrind?
>>
>> Exactly what BLAS are you using?
>>
>> Barry
>>
>>
>>> On Aug 24, 2020, at 7:54 AM, Mark Lohry <mlohry at gmail.com <mailto:mlohry at gmail.com>> wrote:
>>>
>>> Reran with debug mode and got a stack trace for this bus error, looks like it's happening in BLASgemv, see pasted below. I did take care of the ISColoring leak mentioned previously, although that was a very small amount of data and I don't think is relevant here.
>>>
>>> At this point it's happily run 222 timesteps prior to this, so I'm a little mystified. Any ideas?
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>> 222 TS dt 0.03 time 6.66
>>> 0 SNES Function norm 4.124287265556e+02
>>> 0 KSP Residual norm 4.124287265556e+02
>>> 1 KSP Residual norm 4.123248052318e+02
>>> 2 KSP Residual norm 4.123173350456e+02
>>> 3 KSP Residual norm 4.118769044110e+02
>>> 4 KSP Residual norm 4.094856150740e+02
>>> 5 KSP Residual norm 4.006000788078e+02
>>> 6 KSP Residual norm 3.787922969183e+02
>>> [clip]
>>> Linear solve converged due to CONVERGED_RTOL iterations 9
>>> Line search: Using full step: fnorm 4.015236590684e+01 gnorm 3.173434863784e+00
>>> 2 SNES Function norm 3.173434863784e+00
>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2
>>> 0 SNES Function norm 5.842010710080e+02
>>> 0 KSP Residual norm 5.842010710080e+02
>>> 1 KSP Residual norm 5.840526408234e+02
>>> 2 KSP Residual norm 5.840431857354e+02
>>> 3 KSP Residual norm 5.834351392302e+02
>>> 4 KSP Residual norm 5.800901047861e+02
>>> 5 KSP Residual norm 5.675562288567e+02
>>> 6 KSP Residual norm 5.366287895681e+02
>>> 7 KSP Residual norm 4.725811521866e+02
>>> [911]PETSC ERROR: ------------------------------------------------------------------------
>>> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access
>>> [911]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>> [911]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>> [911]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> [911]PETSC ERROR: likely location of problem given in stack below
>>> [911]PETSC ERROR: --------------------- Stack Frames ------------------------------------
>>> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>> [911]PETSC ERROR: INSTEAD the line number of the start of the function
>>> [911]PETSC ERROR: is given.
>>> [911]PETSC ERROR: [911] BLASgemv line 1393 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c
>>> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c
>>> [911]PETSC ERROR: [911] MatSolve line 3354 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
>>> [911]PETSC ERROR: [911] PCApply_ILU line 201 /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c
>>> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c
>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h
>>> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c
>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c
>>> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c
>>> [911]PETSC ERROR: [911] PCApply_ASM line 441 /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c
>>> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c
>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h
>>> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
>>> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c
>>> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c
>>> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c
>>> [911]PETSC ERROR: [911] SNESSolve line 4403 /home/mlohry/build/external/petsc/src/snes/interface/snes.c
>>> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c
>>> [911]PETSC ERROR: [911] TSStep line 3682 /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> [911]PETSC ERROR: [911] TSSolve line 4005 /home/mlohry/build/external/petsc/src/ts/interface/ts.c
>>> [911]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>> [911]PETSC ERROR: Signal received
>>> [911]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>>> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020
>>> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by mlohry Sun Aug 23 19:54:21 2020
>>> [911]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/local/openmpi/3.1.3/gcc/x8
>>> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file
>>> --------------------------------------------------------------------------
>>> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD
>>>
>>> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry <mlohry at gmail.com <mailto:mlohry at gmail.com>> wrote:
>>> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()?
>>>
>>> I have matching ISColoringGet/Restore here, and it's only used prior to the first iteration so at least it doesn't seem to be growing. At the bottom I pasted the malloc_view and malloc_debug output from running 1 time step.
>>>
>>> I'm sort of thinking this might be a red herring -- is it possible the rank 0 process is chewing up dramatically more memory than others, like with logging or something? Like I mentioned earlier the total memory usage is well under the machine limits. I'll spring in some PetscMemoryGetMaximumUsage logging at every time step and try to get a big job going again.
>>>
>>>
>>>
>>> Are you using Fortran?
>>>
>>> C++
>>>
>>>
>>>
>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c
>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c
>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c
>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c
>>> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c
>>> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire process 719073280
>>> [0] Memory usage sorted by function
>>> [0] 6 192 DMCoarsenHookAdd()
>>> [0] 2 9984 DMCreate()
>>> [0] 2 128 DMCreate_Shell()
>>> [0] 2 64 DMDSEnlarge_Static()
>>> [0] 1 672 DMKSPCreate()
>>> [0] 3 96 DMRefineHookAdd()
>>> [0] 3 2064 DMSNESCreate()
>>> [0] 4 128 DMSubDomainHookAdd()
>>> [0] 1 768 DMTSCreate()
>>> [0] 2 96 ISColoringCreate()
>>> [0] 8 12608 ISColoringGetIS()
>>> [0] 1 307200 ISConcatenate()
>>> [0] 29 25984 ISCreate()
>>> [0] 25 400 ISCreate_General()
>>> [0] 4 64 ISCreate_Stride()
>>> [0] 20 338016 ISGeneralSetIndices_General()
>>> [0] 3 921600 ISGetIndices_Stride()
>>> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic()
>>> [0] 1 6144 ISInvertPermutation_General()
>>> [0] 3 308576 ISLocalToGlobalMappingCreate()
>>> [0] 2 32 KSPConvergedDefaultCreate()
>>> [0] 2 2816 KSPCreate()
>>> [0] 1 224 KSPCreate_FGMRES()
>>> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization()
>>> [0] 2 16032 KSPSetUp_FGMRES()
>>> [0] 4 16084160 KSPSetUp_GMRES()
>>> [0] 2 36864 MatColoringApply_SL()
>>> [0] 1 656 MatColoringCreate()
>>> [0] 6 17088 MatCreate()
>>> [0] 1 16 MatCreateMFFD_WP()
>>> [0] 1 16 MatCreateSubMatrices_SeqBAIJ()
>>> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ()
>>> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private()
>>> [0] 2 1472 MatCreate_MFFD()
>>> [0] 1 416 MatCreate_SeqAIJ()
>>> [0] 3 864 MatCreate_SeqBAIJ()
>>> [0] 2 416 MatCreate_Shell()
>>> [0] 1 784 MatFDColoringCreate()
>>> [0] 2 12288 MatFDColoringDegreeSequence_Minpack()
>>> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ()
>>> [0] 3 42512 MatGetColumnIJ_SeqAIJ()
>>> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color()
>>> [0] 1 6144 MatGetOrdering_Natural()
>>> [0] 2 36384 MatGetRowIJ_SeqAIJ()
>>> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ()
>>> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ()
>>> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N()
>>> [0] 1 6144 MatMarkDiagonal_SeqAIJ()
>>> [0] 1 6144 MatMarkDiagonal_SeqBAIJ()
>>> [0] 8 256 MatRegisterRootName()
>>> [0] 1 6160 MatSeqAIJCheckInode()
>>> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ()
>>> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ()
>>> [0] 13 576 MatSolverTypeRegister()
>>> [0] 1 16 PCASMCreateSubdomains()
>>> [0] 2 1664 PCCreate()
>>> [0] 1 160 PCCreate_ASM()
>>> [0] 1 192 PCCreate_ILU()
>>> [0] 5 307264 PCSetUp_ASM()
>>> [0] 2 416 PetscBTCreate()
>>> [0] 2 3216 PetscClassPerfLogCreate()
>>> [0] 2 1616 PetscClassRegLogCreate()
>>> [0] 2 32 PetscCommBuildTwoSided_Allreduce()
>>> [0] 2 64 PetscCommDuplicate()
>>> [0] 2 1888 PetscDSCreate()
>>> [0] 2 26416 PetscEventPerfLogCreate()
>>> [0] 2 158400 PetscEventPerfLogEnsureSize()
>>> [0] 2 1616 PetscEventRegLogCreate()
>>> [0] 2 9600 PetscEventRegLogRegister()
>>> [0] 8 102400 PetscFreeSpaceGet()
>>> [0] 474 15168 PetscFunctionListAdd_Private()
>>> [0] 2 528 PetscIntStackCreate()
>>> [0] 142 11360 PetscLayoutCreate()
>>> [0] 56 896 PetscLayoutSetUp()
>>> [0] 59 9440 PetscObjectComposedDataIncreaseReal()
>>> [0] 2 576 PetscObjectListAdd()
>>> [0] 33 768 PetscOptionsGetEList()
>>> [0] 1 16 PetscOptionsHelpPrintedCreate()
>>> [0] 1 32 PetscPushSignalHandler()
>>> [0] 7 6944 PetscSFCreate()
>>> [0] 3 432 PetscSFCreate_Basic()
>>> [0] 2 1472 PetscSFLinkCreate()
>>> [0] 11 1229040 PetscSFSetUpRanks()
>>> [0] 7 614512 PetscSFSetUp_Basic()
>>> [0] 4 20096 PetscSegBufferCreate()
>>> [0] 2 1488 PetscSplitReductionCreate()
>>> [0] 2 3008 PetscStageLogCreate()
>>> [0] 1148 23872 PetscStrallocpy()
>>> [0] 6 13056 PetscStrreplace()
>>> [0] 9 3456 PetscTableCreate()
>>> [0] 1 16 PetscViewerASCIIOpen()
>>> [0] 6 96 PetscViewerAndFormatCreate()
>>> [0] 1 752 PetscViewerCreate()
>>> [0] 1 96 PetscViewerCreate_ASCII()
>>> [0] 2 1424 SNESCreate()
>>> [0] 1 16 SNESCreate_NEWTONLS()
>>> [0] 1 1008 SNESLineSearchCreate()
>>> [0] 1 16 SNESLineSearchCreate_BT()
>>> [0] 16 1824 SNESMSRegister()
>>> [0] 46 9056 TSARKIMEXRegister()
>>> [0] 1 1264 TSAdaptCreate()
>>> [0] 8 384 TSBasicSymplecticRegister()
>>> [0] 1 2160 TSCreate()
>>> [0] 1 224 TSCreate_Theta()
>>> [0] 48 5968 TSGLEERegister()
>>> [0] 41 7728 TSRKRegister()
>>> [0] 89 14736 TSRosWRegister()
>>> [0] 71 110192 VecCreate()
>>> [0] 1 307200 VecCreateGhostWithArray()
>>> [0] 123 36874080 VecCreate_MPI_Private()
>>> [0] 7 4300800 VecCreate_Seq()
>>> [0] 8 256 VecCreate_Seq_Private()
>>> [0] 6 400 VecDuplicateVecs_Default()
>>> [0] 3 2352 VecScatterCreate()
>>> [0] 7 1843296 VecScatterSetUp_SF()
>>> [0] 126 2016 VecStashCreate_Private()
>>> [0] 1 3072 mapBlockColoringToJacobian()
>>>
>>> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>
>>> Yes, there are some PETSc objects or arrays that you are not freeing so they are printed at the end of the run. For small runs this harmless but if new objects/memory is allocated at each iteration and not suitably freed it will eventually add up.
>>>
>>> Run with -malloc_view (small problem with say 2 iterations) it will print everything allocated and might be helpful.
>>>
>>> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()?
>>>
>>> It is also possible it is a leak in PETSc, but that is unlikely since we test for them.
>>>
>>> Are you using Fortran?
>>>
>>> Barry
>>>
>>>
>>>> On Aug 12, 2020, at 1:29 PM, Mark Lohry <mlohry at gmail.com <mailto:mlohry at gmail.com>> wrote:
>>>>
>>>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: https://github.com/boostorg/serialization/issues/104 <https://github.com/boostorg/serialization/issues/104> although I don't think this was causing the issue).
>>>>
>>>> -malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts?
>>>>
>>>>
>>>>
>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c
>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c
>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c
>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>>>> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c
>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c
>>>>
>>>>
>>>>
>>>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>>
>>>> Mark.
>>>>
>>>> When valgrind is not feasible (like on many centrally controlled batch systems) you can run PETSc with an extra flag to do some memory error checks
>>>> -malloc_debug
>>>>
>>>> this
>>>>
>>>> 1) fills all malloced memory with Nan so if the code is using uninitialized memory it may be detected and
>>>> 2) checks the beginning and end of each alloced memory region for out-of-bounds writes at each malloc and free.
>>>>
>>>> it will slow the code down a little bit but generally not a huge amount.
>>>>
>>>> It is no where near as good as valgrind or other memory corruption tools but it has the advantage you can run it anywhere on any size job.
>>>>
>>>>
>>>> Barry
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>>>>
>>>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry <mlohry at gmail.com <mailto:mlohry at gmail.com>> wrote:
>>>>> I'm getting seemingly random failures of late:
>>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access
>>>>>
>>>>> The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems
>>>>> on things that run completely fine.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>> Symptoms:
>>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores
>>>>> 2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics
>>>>> 3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node)
>>>>> 4) running the same setup twice it fails at different points
>>>>>
>>>>> Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200824/3ea06cd9/attachment-0001.html>
More information about the petsc-users
mailing list