[petsc-users] Double free in KSPDestroy

Florian Lindner mailinglists at xgm.de
Thu Jul 23 10:07:04 CDT 2015


Am Mittwoch, 22. Juli 2015, 13:05:57 schrieben Sie:
> 
> > On Jul 22, 2015, at 11:33 AM, Florian Lindner <mailinglists at xgm.de> wrote:
> > 
> > Am Dienstag, 21. Juli 2015, 18:32:02 schrieben Sie:
> >> 
> >>  Try putting a breakpoint in KSPSetUp_GMRES and check the values of all the pointers immediately after the
> >> ierr = PetscCalloc5(hh,&gmres->hh_origin,hes,&gmres->hes_origin,rs,&gmres->rs_origin,cc,&gmres->cc_origin,cc,&gmres->ss_origin);CHKERRQ(ierr);
> >> 
> >> then put your second break point in KSPReset_GMRES and check all the pointers agin just before the 
> >>> ierr = PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr);
> >> 
> >> Of course the pointers should be the same, are they?
> > 
> > Num     Type           Disp Enb Address            What
> > 3       breakpoint     keep y   0x00007ffff6ff6cb5 in KSPReset_GMRES at /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258
> > 4       breakpoint     keep y   0x00007ffff6ff49a1 in KSPSetUp_GMRES at /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:54
> > 
> > The pointer gmres is the same. Just one function call later, at mal.c:72 it crashes. The pointer that is freed is gmres->hh_origin which also hasn't changed.
> > 
> > What confuses me is that:
> > 
> > Breakpoint 3, KSPReset_GMRES (ksp=0xe904b0) at /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258
> > 258	  ierr = PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr);
> > (gdb) print gmres->hh_origin
> > $24 = (PetscScalar *) 0xf10250
> > 
> > hh_origin is the first argument, I step into PetscFree5:
> > 
> > (gdb) s
> > PetscFreeAlign (ptr=0xf15aa0, line=258, func=0x7ffff753c4c8 <__func__.20306> "KSPReset_GMRES", file=0x7ffff753b8b0 "/home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c") at /home/florian/software/petsc/src/sys/memory/mal.c:54
> > 54	  if (!ptr) return 0;
> > (gdb) print ptr
> > $25 = (void *) 0xf15aa0
> > 
> > Why have the value changed? I expect gmres->hh_origin == ptr.
> 
>    Definitely a problem here.
> 
> > Could this be a sign of stack corruption at same ealier stage?
> 
>    Could be, but valgrind usually finds such things. 
> 
>    You can do the following: edit $PETSC_DIR/$PETSC_ARCH/include/petscconf.h and add the lines
> 
> #if !defined(PETSC_USE_MALLOC_COALESCED)
> #define PETSC_USE_MALLOC_COALESCED
> #endif
> 
> then run 
> 
> make gnumake in the $PETSC_DIR directory.  Then relink your program and try running it.

Sorry, no success. Same story. :-(

I have removed my small petsc wrapper lib from the code and it's pure petsc now. Everything petsc related (beside Init and Finalize) is done in that piece of code. Everything petsc is private, so no outside should mess with it.

https://github.com/floli/precice/blob/petsc_debugging/src/mapping/PetRadialBasisFctMapping.hpp

if you wanna have a look... Maybe you see something evil I'm doing.

Best Thanks,
Florian


> 
>   Barry
> 
> 
> 
> 
> > 
> > I was also trying to build petsc with clang for using its memory-sanitizer, but without success. Same for precice.
> > 
> > 
> >> If so you can run in the debugger and check the values at some points between the creation and destruction to see where they get changed to bad values. Normally, of course, valgrind would be very helpful in finding exactly when things go bad.
> > 
> > What do you mean with changing to bad? They are the same after Calloc and before PetscFree5.
> > 
> > Best Regards,
> > Florian
> > 
> >>  I'm afraid I'm going to have to give up on building this stuff myself; too painful.
> > 
> > Sorry about that. 
> > 
> >> 
> >>  Barry
> >> 
> >> 
> >>> On Jul 21, 2015, at 8:54 AM, Florian Lindner <mailinglists at xgm.de> wrote:
> >>> 
> >>> Hey Barry,
> >>> 
> >>> were you able to reproduce the error?
> >>> 
> >>> I tried to set a breakpoint at
> >>> 
> >>> PetscErrorCode KSPReset_GMRES(KSP ksp)
> >>> {
> >>> KSP_GMRES      *gmres = (KSP_GMRES*)ksp->data;
> >>> PetscErrorCode ierr;
> >>> PetscInt       i;
> >>> 
> >>> PetscFunctionBegin;
> >>> /* Free the Hessenberg matrices */
> >>> ierr = PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr);
> >>> 
> >>> in gmres.c, the last line produces the error...
> >>> 
> >>> Interestingly this piece of code is traversed only once, so at least no double calling of the same code that frees the pointer...
> >>> 
> >>> Best Regards,
> >>> Florian
> >>> 
> >>> 
> >>> Am Donnerstag, 16. Juli 2015, 17:59:15 schrieben Sie:
> >>>> 
> >>>> I am on a mac, no idea what the 'lo' local host loop back should be
> >>>> 
> >>>> $  ./pmpi B
> >>>> MPI rank 0 of 1
> >>>> [PRECICE] Run in coupling mode
> >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], [3.20000000000000006661e-01, 0.00000000000000000000e+00], [5.20000000000000017764e-01, 0.00000000000000000000e+00], [7.20000000000000084377e-01, 0.00000000000000000000e+00], [9.20000000000000039968e-01, 0.00000000000000000000e+00]]
> >>>> Setting up master communication to coupling partner/s 
> >>>> (0)  [PRECICE] ERROR: Network "lo" not found for socket connection!
> >>>> Run finished at Thu Jul 16 17:50:39 2015
> >>>> Global runtime = 41ms / 0s
> >>>> 
> >>>> Event                Count    Total[ms]     Max[ms]     Min[ms]     Avg[ms]   T%
> >>>> --------------------------------------------------------------------------------
> >>>> Properties from all Events, accumulated
> >>>> ---------------------------------------
> >>>> 
> >>>> Abort trap: 6
> >>>> ~/Src/prempi (master *=) arch-debug
> >>>> $  ./pmpi B
> >>>> MPI rank 0 of 1
> >>>> [PRECICE] Run in coupling mode
> >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], [3.20000000000000006661e-01, 0.00000000000000000000e+00], [5.20000000000000017764e-01, 0.00000000000000000000e+00], [7.20000000000000084377e-01, 0.00000000000000000000e+00], [9.20000000000000039968e-01, 0.00000000000000000000e+00]]
> >>>> Setting up master communication to coupling partner/s 
> >>>> (0)  [PRECICE] ERROR: Network "localhost" not found for socket connection!
> >>>> Run finished at Thu Jul 16 17:50:52 2015
> >>>> Global runtime = 40ms / 0s
> >>>> 
> >>>> Event                Count    Total[ms]     Max[ms]     Min[ms]     Avg[ms]   T%
> >>>> --------------------------------------------------------------------------------
> >>>> Properties from all Events, accumulated
> >>>> ---------------------------------------
> >>>> 
> >>>> Abort trap: 6
> >>>> ~/Src/prempi (master *=) arch-debug
> >>>> $ hostname
> >>>> Barrys-MacBook-Pro.local
> >>>> ~/Src/prempi (master *=) arch-debug
> >>>> $  ./pmpi B
> >>>> MPI rank 0 of 1
> >>>> [PRECICE] Run in coupling mode
> >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], [3.20000000000000006661e-01, 0.00000000000000000000e+00], [5.20000000000000017764e-01, 0.00000000000000000000e+00], [7.20000000000000084377e-01, 0.00000000000000000000e+00], [9.20000000000000039968e-01, 0.00000000000000000000e+00]]
> >>>> Setting up master communication to coupling partner/s 
> >>>> (0)  [PRECICE] ERROR: Network "Barrys-MacBook-Pro.local" not found for socket connection!
> >>>> Run finished at Thu Jul 16 17:51:12 2015
> >>>> Global runtime = 39ms / 0s
> >>>> 
> >>>> Event                Count    Total[ms]     Max[ms]     Min[ms]     Avg[ms]   T%
> >>>> --------------------------------------------------------------------------------
> >>>> Properties from all Events, accumulated
> >>>> ---------------------------------------
> >>>> 
> >>>> Abort trap: 6
> >>>> ~/Src/prempi (master *=) arch-debug
> >>>> $  ./pmpi B
> >>>> MPI rank 0 of 1
> >>>> [PRECICE] Run in coupling mode
> >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], [3.20000000000000006661e-01, 0.00000000000000000000e+00], [5.20000000000000017764e-01, 0.00000000000000000000e+00], [7.20000000000000084377e-01, 0.00000000000000000000e+00], [9.20000000000000039968e-01, 0.00000000000000000000e+00]]
> >>>> Setting up master communication to coupling partner/s 
> >>>> (0)  [PRECICE] ERROR: Network "10.0.1.2" not found for socket connection!
> >>>> Run finished at Thu Jul 16 17:53:02 2015
> >>>> Global runtime = 42ms / 0s
> >>>> 
> >>>> Event                Count    Total[ms]     Max[ms]     Min[ms]     Avg[ms]   T%
> >>>> --------------------------------------------------------------------------------
> >>>> Properties from all Events, accumulated
> >>>> ---------------------------------------
> >>>> 
> >>>> Abort trap: 6
> >>>> ~/Src/prempi (master *=) arch-debug
> >>>> 
> >>>>> On Jul 15, 2015, at 1:53 AM, Florian Lindner <mailinglists at xgm.de> wrote:
> >>>>> 
> >>>>> Hey
> >>>>> 
> >>>>> Am Dienstag, 14. Juli 2015, 13:20:33 schrieben Sie:
> >>>>>> 
> >>>>>> How to install Eigen? I tried brew install eigen but it didn't help.
> >>>>> 
> >>>>> You may need to set the CPLUS_INCLUDE_PATH to something like "/usr/include/eigen3"
> >>>>> Easiest way however is probably to download eigen from http://bitbucket.org/eigen/eigen/get/3.2.5.tar.bz2 and move the Eigen folder from that archive to precice/src. 
> >>>>> 
> >>>>>> Also what about the PRECICE_MPI_ stuff. It sure doesn't point to anything valid.
> >>>>> 
> >>>>> You probably don't need to set it if you use a mpic++ or mpicxx compiler wrapper that take care of that.
> >>>>> 
> >>>>> Thx,
> >>>>> Florian
> >>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> Barry
> >>>>>> 
> >>>>>> $ MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on mpi=on compiler=/Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ build=debug
> >>>>>> scons: Reading SConscript files ...
> >>>>>> 
> >>>>>> Build options ...
> >>>>>> (default)  builddir                  = build      Directory holding build files. ( /path/to/builddir )
> >>>>>> (default)  build                     = debug      Build type, either release or debug (release|debug)
> >>>>>> (modified) compiler                  = /Users/barrysmith/Src/petsc/arch-debug/bin/mpic++   Compiler to use.
> >>>>>> (modified) mpi                       = True       Enables MPI-based communication and running coupling tests. (yes|no)
> >>>>>> (default)  sockets                   = True       Enables Socket-based communication. (yes|no)
> >>>>>> (modified) boost_inst                = True       Enable if Boost is available compiled and installed. (yes|no)
> >>>>>> (default)  spirit2                   = True       Used for parsing VRML file geometries and checkpointing. (yes|no)
> >>>>>> (modified) petsc                     = True       Enable use of the Petsc linear algebra library. (yes|no)
> >>>>>> (modified) python                    = False      Used for Python scripted solver actions. (yes|no)
> >>>>>> (default)  gprof                     = False      Used in detailed performance analysis. (yes|no)
> >>>>>> ... done
> >>>>>> 
> >>>>>> Environment variables used for this build ...
> >>>>>> (have to be defined by the user to configure build)
> >>>>>> (modified) PETSC_DIR                 = /Users/barrysmith/Src/PETSc
> >>>>>> (modified) PETSC_ARCH                = arch-debug
> >>>>>> (default)  PRECICE_BOOST_SYSTEM_LIB  = boost_system
> >>>>>> (default)  PRECICE_BOOST_FILESYSTEM_LIB = boost_filesystem
> >>>>>> (default)  PRECICE_MPI_LIB_PATH      = /usr/lib/
> >>>>>> (default)  PRECICE_MPI_LIB           = mpich   
> >>>>>> (default)  PRECICE_MPI_INC_PATH      = /usr/include/mpich2
> >>>>>> (default)  PRECICE_PTHREAD_LIB_PATH  = /usr/lib
> >>>>>> (default)  PRECICE_PTHREAD_LIB       = pthread 
> >>>>>> (default)  PRECICE_PTHREAD_INC_PATH  = /usr/include
> >>>>>> ... done
> >>>>>> 
> >>>>>> Configuring build variables ...
> >>>>>> Checking whether the C++ compiler works... yes
> >>>>>> Checking for C library petsc... yes
> >>>>>> Checking for C++ header file Eigen/Dense... no
> >>>>>> ERROR: Header 'Eigen/Dense' (needed for Eigen) not found or does not compile!
> >>>>>> $ brew install eigen
> >>>>>> ==> Downloading https://downloads.sf.net/project/machomebrew/Bottles/eigen-3.2.3.yosemite.bottle.tar.gz
> >>>>>> ######################################################################## 100.0%
> >>>>>> ==> Pouring eigen-3.2.3.yosemite.bottle.tar.gz
> >>>>>> 🍺  /usr/local/Cellar/eigen/3.2.3: 361 files, 4.1M
> >>>>>> ~/Src/precice (develop=) arch-debug
> >>>>>> $ MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on mpi=on compiler=/Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ build=debug
> >>>>>> scons: Reading SConscript files ...
> >>>>>> 
> >>>>>> Build options ...
> >>>>>> (default)  builddir                  = build      Directory holding build files. ( /path/to/builddir )
> >>>>>> (default)  build                     = debug      Build type, either release or debug (release|debug)
> >>>>>> (modified) compiler                  = /Users/barrysmith/Src/petsc/arch-debug/bin/mpic++   Compiler to use.
> >>>>>> (modified) mpi                       = True       Enables MPI-based communication and running coupling tests. (yes|no)
> >>>>>> (default)  sockets                   = True       Enables Socket-based communication. (yes|no)
> >>>>>> (modified) boost_inst                = True       Enable if Boost is available compiled and installed. (yes|no)
> >>>>>> (default)  spirit2                   = True       Used for parsing VRML file geometries and checkpointing. (yes|no)
> >>>>>> (modified) petsc                     = True       Enable use of the Petsc linear algebra library. (yes|no)
> >>>>>> (modified) python                    = False      Used for Python scripted solver actions. (yes|no)
> >>>>>> (default)  gprof                     = False      Used in detailed performance analysis. (yes|no)
> >>>>>> ... done
> >>>>>> 
> >>>>>> Environment variables used for this build ...
> >>>>>> (have to be defined by the user to configure build)
> >>>>>> (modified) PETSC_DIR                 = /Users/barrysmith/Src/PETSc
> >>>>>> (modified) PETSC_ARCH                = arch-debug
> >>>>>> (default)  PRECICE_BOOST_SYSTEM_LIB  = boost_system
> >>>>>> (default)  PRECICE_BOOST_FILESYSTEM_LIB = boost_filesystem
> >>>>>> (default)  PRECICE_MPI_LIB_PATH      = /usr/lib/
> >>>>>> (default)  PRECICE_MPI_LIB           = mpich   
> >>>>>> (default)  PRECICE_MPI_INC_PATH      = /usr/include/mpich2
> >>>>>> (default)  PRECICE_PTHREAD_LIB_PATH  = /usr/lib
> >>>>>> (default)  PRECICE_PTHREAD_LIB       = pthread 
> >>>>>> (default)  PRECICE_PTHREAD_INC_PATH  = /usr/include
> >>>>>> ... done
> >>>>>> 
> >>>>>> Configuring build variables ...
> >>>>>> Checking whether the C++ compiler works... yes
> >>>>>> Checking for C library petsc... yes
> >>>>>> Checking for C++ header file Eigen/Dense... no
> >>>>>> ERROR: Header 'Eigen/Dense' (needed for Eigen) not found or does not compile!
> >>>>>> ~/Src/precice (develop=) arch-debug
> >>>>>> 
> >>>>>> 
> >>>>>>> On Jul 14, 2015, at 2:14 AM, Florian Lindner <mailinglists at xgm.de> wrote:
> >>>>>>> 
> >>>>>>> Hello,
> >>>>>>> 
> >>>>>>> Am Montag, 13. Juli 2015, 12:26:21 schrieb Barry Smith:
> >>>>>>>> 
> >>>>>>>> Run under valgrind first, see if it gives any more details about the memory issue http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >>>>>>> 
> >>>>>>> I tried running it like that:
> >>>>>>> 
> >>>>>>> valgrind --tool=memcheck ./pmpi A -malloc off 
> >>>>>>> 
> >>>>>>> (pmpi is my application, no mpirun)
> >>>>>>> 
> >>>>>>> but it reported no errors at all.
> >>>>>>> 
> >>>>>>>> Can you send the code that produces this problem?
> >>>>>>> 
> >>>>>>> I was not able to isolate that problem, you can of course have a look at our application:
> >>>>>>> 
> >>>>>>> git clone git at github.com:precice/precice.git
> >>>>>>> MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on mpi=on compiler=mpic++ build=debug
> >>>>>>> 
> >>>>>>> The test client:
> >>>>>>> git clone git at github.com:floli/prempi.git
> >>>>>>> you need to adapt line 5 in SConstruct: preciceRoot
> >>>>>>> scons
> >>>>>>> 
> >>>>>>> Take one terminal run ./pmpi A, another to run ./pmpi B
> >>>>>>> 
> >>>>>>> Thanks for taking a look! Mail me if any problem with the build occurs.
> >>>>>>> 
> >>>>>>> Florian
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On Jul 13, 2015, at 10:56 AM, Florian Lindner <mailinglists at xgm.de> wrote:
> >>>>>>>>> 
> >>>>>>>>> Hello,
> >>>>>>>>> 
> >>>>>>>>> our petsc application suffers from a memory error (double free or corruption).
> >>>>>>>>> 
> >>>>>>>>> Situation is a like that:
> >>>>>>>>> 
> >>>>>>>>> A KSP is private member of a C++ class. In its constructor I call KSPCreate. Inbetween it may haben that I call KSPREset. In the class' destructor I call KSPDestroy. That's where the memory error appears:
> >>>>>>>>> 
> >>>>>>>>> gdb backtrace:
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> #4  0x00007ffff490b8db in _int_free () from /usr/lib/libc.so.6
> >>>>>>>>> #5  0x00007ffff6188c9c in PetscFreeAlign (ptr=0xfcd990, line=258, func=0x7ffff753c4c8 <__func__.20304> "KSPReset_GMRES", file=0x7ffff753b8b0 "/home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c")
> >>>>>>>>> at /home/florian/software/petsc/src/sys/memory/mal.c:72
> >>>>>>>>> #6  0x00007ffff6ff6cdc in KSPReset_GMRES (ksp=0xf48470) at /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258
> >>>>>>>>> #7  0x00007ffff70ad804 in KSPReset (ksp=0xf48470) at /home/florian/software/petsc/src/ksp/ksp/interface/itfunc.c:885
> >>>>>>>>> #8  0x00007ffff70ae2e8 in KSPDestroy (ksp=0xeb89d8) at /home/florian/software/petsc/src/ksp/ksp/interface/itfunc.c:933
> >>>>>>>>> 
> >>>>>>>>> #9  0x0000000000599b24 in precice::mapping::PetRadialBasisFctMapping<precice::mapping::Gaussian>::~PetRadialBasisFctMapping (this=0xeb8960) at src/mapping/PetRadialBasisFctMapping.hpp:148
> >>>>>>>>> #10 0x0000000000599bc9 in precice::mapping::PetRadialBasisFctMapping<precice::mapping::Gaussian>::~PetRadialBasisFctMapping (this=0xeb8960) at src/mapping/PetRadialBasisFctMapping.hpp:146
> >>>>>>>>> 
> >>>>>>>>> Complete backtrace at http://pastebin.com/ASjibeNF
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> Could it be a problem it objects set by KSPSetOperators are destroyed afterwards? I don't think so, since KSPReset is called before.
> >>>>>>>>> 
> >>>>>>>>> I've wrapped a class (just a bunch of helper function, no encapsulating wrapper) round Mat and Vec objects. Nothing fancy, the ctor calls MatCreate, the dtor MatDestroy, you can have a look at https://github.com/precice/precice/blob/develop/src/mapping/petnum.cpp / .hpp.
> >>>>>>>>> 
> >>>>>>>>> These objects are also members of the same class like KSP, so their dtor is called after KSPDestroy.
> >>>>>>>>> 
> >>>>>>>>> What could cause the memory corruption here?
> >>>>>>>>> 
> >>>>>>>>> Thanks a lot,
> >>>>>>>>> Florian
> 


More information about the petsc-users mailing list