[petsc-users] Double free in KSPDestroy

Matthew Knepley knepley at gmail.com
Thu Jul 23 10:22:46 CDT 2015


On Thu, Jul 23, 2015 at 10:07 AM, Florian Lindner <mailinglists at xgm.de>
wrote:

>
> Am Mittwoch, 22. Juli 2015, 13:05:57 schrieben Sie:
> >
> > > On Jul 22, 2015, at 11:33 AM, Florian Lindner <mailinglists at xgm.de>
> wrote:
> > >
> > > Am Dienstag, 21. Juli 2015, 18:32:02 schrieben Sie:
> > >>
> > >>  Try putting a breakpoint in KSPSetUp_GMRES and check the values of
> all the pointers immediately after the
> > >> ierr =
> PetscCalloc5(hh,&gmres->hh_origin,hes,&gmres->hes_origin,rs,&gmres->rs_origin,cc,&gmres->cc_origin,cc,&gmres->ss_origin);CHKERRQ(ierr);
> > >>
> > >> then put your second break point in KSPReset_GMRES and check all the
> pointers agin just before the
> > >>> ierr =
> PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr);
> > >>
> > >> Of course the pointers should be the same, are they?
> > >
> > > Num     Type           Disp Enb Address            What
> > > 3       breakpoint     keep y   0x00007ffff6ff6cb5 in KSPReset_GMRES
> at /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258
> > > 4       breakpoint     keep y   0x00007ffff6ff49a1 in KSPSetUp_GMRES
> at /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:54
> > >
> > > The pointer gmres is the same. Just one function call later, at
> mal.c:72 it crashes. The pointer that is freed is gmres->hh_origin which
> also hasn't changed.
> > >
> > > What confuses me is that:
> > >
> > > Breakpoint 3, KSPReset_GMRES (ksp=0xe904b0) at
> /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258
> > > 258   ierr =
> PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr);
> > > (gdb) print gmres->hh_origin
> > > $24 = (PetscScalar *) 0xf10250
> > >
> > > hh_origin is the first argument, I step into PetscFree5:
> > >
> > > (gdb) s
> > > PetscFreeAlign (ptr=0xf15aa0, line=258, func=0x7ffff753c4c8
> <__func__.20306> "KSPReset_GMRES", file=0x7ffff753b8b0
> "/home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c") at
> /home/florian/software/petsc/src/sys/memory/mal.c:54
> > > 54    if (!ptr) return 0;
> > > (gdb) print ptr
> > > $25 = (void *) 0xf15aa0
> > >
> > > Why have the value changed? I expect gmres->hh_origin == ptr.
> >
> >    Definitely a problem here.
> >
> > > Could this be a sign of stack corruption at same ealier stage?
> >
> >    Could be, but valgrind usually finds such things.
> >
> >    You can do the following: edit
> $PETSC_DIR/$PETSC_ARCH/include/petscconf.h and add the lines
> >
> > #if !defined(PETSC_USE_MALLOC_COALESCED)
> > #define PETSC_USE_MALLOC_COALESCED
> > #endif
> >
> > then run
> >
> > make gnumake in the $PETSC_DIR directory.  Then relink your program and
> try running it.
>
> Sorry, no success. Same story. :-(
>
> I have removed my small petsc wrapper lib from the code and it's pure
> petsc now. Everything petsc related (beside Init and Finalize) is done in
> that piece of code. Everything petsc is private, so no outside should mess
> with it.
>
>
> https://github.com/floli/precice/blob/petsc_debugging/src/mapping/PetRadialBasisFctMapping.hpp
>
> if you wanna have a look... Maybe you see something evil I'm doing.
>

I can't see anything by eye. Can you tell me how to run a small problem
which shows the error? I will go
through it until we find out what is going on.

  Thanks,

    Matt


> Best Thanks,
> Florian
>
>
> >
> >   Barry
> >
> >
> >
> >
> > >
> > > I was also trying to build petsc with clang for using its
> memory-sanitizer, but without success. Same for precice.
> > >
> > >
> > >> If so you can run in the debugger and check the values at some points
> between the creation and destruction to see where they get changed to bad
> values. Normally, of course, valgrind would be very helpful in finding
> exactly when things go bad.
> > >
> > > What do you mean with changing to bad? They are the same after Calloc
> and before PetscFree5.
> > >
> > > Best Regards,
> > > Florian
> > >
> > >>  I'm afraid I'm going to have to give up on building this stuff
> myself; too painful.
> > >
> > > Sorry about that.
> > >
> > >>
> > >>  Barry
> > >>
> > >>
> > >>> On Jul 21, 2015, at 8:54 AM, Florian Lindner <mailinglists at xgm.de>
> wrote:
> > >>>
> > >>> Hey Barry,
> > >>>
> > >>> were you able to reproduce the error?
> > >>>
> > >>> I tried to set a breakpoint at
> > >>>
> > >>> PetscErrorCode KSPReset_GMRES(KSP ksp)
> > >>> {
> > >>> KSP_GMRES      *gmres = (KSP_GMRES*)ksp->data;
> > >>> PetscErrorCode ierr;
> > >>> PetscInt       i;
> > >>>
> > >>> PetscFunctionBegin;
> > >>> /* Free the Hessenberg matrices */
> > >>> ierr =
> PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr);
> > >>>
> > >>> in gmres.c, the last line produces the error...
> > >>>
> > >>> Interestingly this piece of code is traversed only once, so at least
> no double calling of the same code that frees the pointer...
> > >>>
> > >>> Best Regards,
> > >>> Florian
> > >>>
> > >>>
> > >>> Am Donnerstag, 16. Juli 2015, 17:59:15 schrieben Sie:
> > >>>>
> > >>>> I am on a mac, no idea what the 'lo' local host loop back should be
> > >>>>
> > >>>> $  ./pmpi B
> > >>>> MPI rank 0 of 1
> > >>>> [PRECICE] Run in coupling mode
> > >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00],
> [3.20000000000000006661e-01, 0.00000000000000000000e+00],
> [5.20000000000000017764e-01, 0.00000000000000000000e+00],
> [7.20000000000000084377e-01, 0.00000000000000000000e+00],
> [9.20000000000000039968e-01, 0.00000000000000000000e+00]]
> > >>>> Setting up master communication to coupling partner/s
> > >>>> (0)  [PRECICE] ERROR: Network "lo" not found for socket connection!
> > >>>> Run finished at Thu Jul 16 17:50:39 2015
> > >>>> Global runtime = 41ms / 0s
> > >>>>
> > >>>> Event                Count    Total[ms]     Max[ms]     Min[ms]
>  Avg[ms]   T%
> > >>>>
> --------------------------------------------------------------------------------
> > >>>> Properties from all Events, accumulated
> > >>>> ---------------------------------------
> > >>>>
> > >>>> Abort trap: 6
> > >>>> ~/Src/prempi (master *=) arch-debug
> > >>>> $  ./pmpi B
> > >>>> MPI rank 0 of 1
> > >>>> [PRECICE] Run in coupling mode
> > >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00],
> [3.20000000000000006661e-01, 0.00000000000000000000e+00],
> [5.20000000000000017764e-01, 0.00000000000000000000e+00],
> [7.20000000000000084377e-01, 0.00000000000000000000e+00],
> [9.20000000000000039968e-01, 0.00000000000000000000e+00]]
> > >>>> Setting up master communication to coupling partner/s
> > >>>> (0)  [PRECICE] ERROR: Network "localhost" not found for socket
> connection!
> > >>>> Run finished at Thu Jul 16 17:50:52 2015
> > >>>> Global runtime = 40ms / 0s
> > >>>>
> > >>>> Event                Count    Total[ms]     Max[ms]     Min[ms]
>  Avg[ms]   T%
> > >>>>
> --------------------------------------------------------------------------------
> > >>>> Properties from all Events, accumulated
> > >>>> ---------------------------------------
> > >>>>
> > >>>> Abort trap: 6
> > >>>> ~/Src/prempi (master *=) arch-debug
> > >>>> $ hostname
> > >>>> Barrys-MacBook-Pro.local
> > >>>> ~/Src/prempi (master *=) arch-debug
> > >>>> $  ./pmpi B
> > >>>> MPI rank 0 of 1
> > >>>> [PRECICE] Run in coupling mode
> > >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00],
> [3.20000000000000006661e-01, 0.00000000000000000000e+00],
> [5.20000000000000017764e-01, 0.00000000000000000000e+00],
> [7.20000000000000084377e-01, 0.00000000000000000000e+00],
> [9.20000000000000039968e-01, 0.00000000000000000000e+00]]
> > >>>> Setting up master communication to coupling partner/s
> > >>>> (0)  [PRECICE] ERROR: Network "Barrys-MacBook-Pro.local" not found
> for socket connection!
> > >>>> Run finished at Thu Jul 16 17:51:12 2015
> > >>>> Global runtime = 39ms / 0s
> > >>>>
> > >>>> Event                Count    Total[ms]     Max[ms]     Min[ms]
>  Avg[ms]   T%
> > >>>>
> --------------------------------------------------------------------------------
> > >>>> Properties from all Events, accumulated
> > >>>> ---------------------------------------
> > >>>>
> > >>>> Abort trap: 6
> > >>>> ~/Src/prempi (master *=) arch-debug
> > >>>> $  ./pmpi B
> > >>>> MPI rank 0 of 1
> > >>>> [PRECICE] Run in coupling mode
> > >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00],
> [3.20000000000000006661e-01, 0.00000000000000000000e+00],
> [5.20000000000000017764e-01, 0.00000000000000000000e+00],
> [7.20000000000000084377e-01, 0.00000000000000000000e+00],
> [9.20000000000000039968e-01, 0.00000000000000000000e+00]]
> > >>>> Setting up master communication to coupling partner/s
> > >>>> (0)  [PRECICE] ERROR: Network "10.0.1.2" not found for socket
> connection!
> > >>>> Run finished at Thu Jul 16 17:53:02 2015
> > >>>> Global runtime = 42ms / 0s
> > >>>>
> > >>>> Event                Count    Total[ms]     Max[ms]     Min[ms]
>  Avg[ms]   T%
> > >>>>
> --------------------------------------------------------------------------------
> > >>>> Properties from all Events, accumulated
> > >>>> ---------------------------------------
> > >>>>
> > >>>> Abort trap: 6
> > >>>> ~/Src/prempi (master *=) arch-debug
> > >>>>
> > >>>>> On Jul 15, 2015, at 1:53 AM, Florian Lindner <mailinglists at xgm.de>
> wrote:
> > >>>>>
> > >>>>> Hey
> > >>>>>
> > >>>>> Am Dienstag, 14. Juli 2015, 13:20:33 schrieben Sie:
> > >>>>>>
> > >>>>>> How to install Eigen? I tried brew install eigen but it didn't
> help.
> > >>>>>
> > >>>>> You may need to set the CPLUS_INCLUDE_PATH to something like
> "/usr/include/eigen3"
> > >>>>> Easiest way however is probably to download eigen from
> http://bitbucket.org/eigen/eigen/get/3.2.5.tar.bz2 and move the Eigen
> folder from that archive to precice/src.
> > >>>>>
> > >>>>>> Also what about the PRECICE_MPI_ stuff. It sure doesn't point to
> anything valid.
> > >>>>>
> > >>>>> You probably don't need to set it if you use a mpic++ or mpicxx
> compiler wrapper that take care of that.
> > >>>>>
> > >>>>> Thx,
> > >>>>> Florian
> > >>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Barry
> > >>>>>>
> > >>>>>> $ MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on
> mpi=on compiler=/Users/barrysmith/Src/petsc/arch-debug/bin/mpic++
> build=debug
> > >>>>>> scons: Reading SConscript files ...
> > >>>>>>
> > >>>>>> Build options ...
> > >>>>>> (default)  builddir                  = build      Directory
> holding build files. ( /path/to/builddir )
> > >>>>>> (default)  build                     = debug      Build type,
> either release or debug (release|debug)
> > >>>>>> (modified) compiler                  =
> /Users/barrysmith/Src/petsc/arch-debug/bin/mpic++   Compiler to use.
> > >>>>>> (modified) mpi                       = True       Enables
> MPI-based communication and running coupling tests. (yes|no)
> > >>>>>> (default)  sockets                   = True       Enables
> Socket-based communication. (yes|no)
> > >>>>>> (modified) boost_inst                = True       Enable if Boost
> is available compiled and installed. (yes|no)
> > >>>>>> (default)  spirit2                   = True       Used for
> parsing VRML file geometries and checkpointing. (yes|no)
> > >>>>>> (modified) petsc                     = True       Enable use of
> the Petsc linear algebra library. (yes|no)
> > >>>>>> (modified) python                    = False      Used for Python
> scripted solver actions. (yes|no)
> > >>>>>> (default)  gprof                     = False      Used in
> detailed performance analysis. (yes|no)
> > >>>>>> ... done
> > >>>>>>
> > >>>>>> Environment variables used for this build ...
> > >>>>>> (have to be defined by the user to configure build)
> > >>>>>> (modified) PETSC_DIR                 = /Users/barrysmith/Src/PETSc
> > >>>>>> (modified) PETSC_ARCH                = arch-debug
> > >>>>>> (default)  PRECICE_BOOST_SYSTEM_LIB  = boost_system
> > >>>>>> (default)  PRECICE_BOOST_FILESYSTEM_LIB = boost_filesystem
> > >>>>>> (default)  PRECICE_MPI_LIB_PATH      = /usr/lib/
> > >>>>>> (default)  PRECICE_MPI_LIB           = mpich
> > >>>>>> (default)  PRECICE_MPI_INC_PATH      = /usr/include/mpich2
> > >>>>>> (default)  PRECICE_PTHREAD_LIB_PATH  = /usr/lib
> > >>>>>> (default)  PRECICE_PTHREAD_LIB       = pthread
> > >>>>>> (default)  PRECICE_PTHREAD_INC_PATH  = /usr/include
> > >>>>>> ... done
> > >>>>>>
> > >>>>>> Configuring build variables ...
> > >>>>>> Checking whether the C++ compiler works... yes
> > >>>>>> Checking for C library petsc... yes
> > >>>>>> Checking for C++ header file Eigen/Dense... no
> > >>>>>> ERROR: Header 'Eigen/Dense' (needed for Eigen) not found or does
> not compile!
> > >>>>>> $ brew install eigen
> > >>>>>> ==> Downloading
> https://downloads.sf.net/project/machomebrew/Bottles/eigen-3.2.3.yosemite.bottle.tar.gz
> > >>>>>>
> ########################################################################
> 100.0%
> > >>>>>> ==> Pouring eigen-3.2.3.yosemite.bottle.tar.gz
> > >>>>>> 🍺  /usr/local/Cellar/eigen/3.2.3: 361 files, 4.1M
> > >>>>>> ~/Src/precice (develop=) arch-debug
> > >>>>>> $ MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on
> mpi=on compiler=/Users/barrysmith/Src/petsc/arch-debug/bin/mpic++
> build=debug
> > >>>>>> scons: Reading SConscript files ...
> > >>>>>>
> > >>>>>> Build options ...
> > >>>>>> (default)  builddir                  = build      Directory
> holding build files. ( /path/to/builddir )
> > >>>>>> (default)  build                     = debug      Build type,
> either release or debug (release|debug)
> > >>>>>> (modified) compiler                  =
> /Users/barrysmith/Src/petsc/arch-debug/bin/mpic++   Compiler to use.
> > >>>>>> (modified) mpi                       = True       Enables
> MPI-based communication and running coupling tests. (yes|no)
> > >>>>>> (default)  sockets                   = True       Enables
> Socket-based communication. (yes|no)
> > >>>>>> (modified) boost_inst                = True       Enable if Boost
> is available compiled and installed. (yes|no)
> > >>>>>> (default)  spirit2                   = True       Used for
> parsing VRML file geometries and checkpointing. (yes|no)
> > >>>>>> (modified) petsc                     = True       Enable use of
> the Petsc linear algebra library. (yes|no)
> > >>>>>> (modified) python                    = False      Used for Python
> scripted solver actions. (yes|no)
> > >>>>>> (default)  gprof                     = False      Used in
> detailed performance analysis. (yes|no)
> > >>>>>> ... done
> > >>>>>>
> > >>>>>> Environment variables used for this build ...
> > >>>>>> (have to be defined by the user to configure build)
> > >>>>>> (modified) PETSC_DIR                 = /Users/barrysmith/Src/PETSc
> > >>>>>> (modified) PETSC_ARCH                = arch-debug
> > >>>>>> (default)  PRECICE_BOOST_SYSTEM_LIB  = boost_system
> > >>>>>> (default)  PRECICE_BOOST_FILESYSTEM_LIB = boost_filesystem
> > >>>>>> (default)  PRECICE_MPI_LIB_PATH      = /usr/lib/
> > >>>>>> (default)  PRECICE_MPI_LIB           = mpich
> > >>>>>> (default)  PRECICE_MPI_INC_PATH      = /usr/include/mpich2
> > >>>>>> (default)  PRECICE_PTHREAD_LIB_PATH  = /usr/lib
> > >>>>>> (default)  PRECICE_PTHREAD_LIB       = pthread
> > >>>>>> (default)  PRECICE_PTHREAD_INC_PATH  = /usr/include
> > >>>>>> ... done
> > >>>>>>
> > >>>>>> Configuring build variables ...
> > >>>>>> Checking whether the C++ compiler works... yes
> > >>>>>> Checking for C library petsc... yes
> > >>>>>> Checking for C++ header file Eigen/Dense... no
> > >>>>>> ERROR: Header 'Eigen/Dense' (needed for Eigen) not found or does
> not compile!
> > >>>>>> ~/Src/precice (develop=) arch-debug
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Jul 14, 2015, at 2:14 AM, Florian Lindner <
> mailinglists at xgm.de> wrote:
> > >>>>>>>
> > >>>>>>> Hello,
> > >>>>>>>
> > >>>>>>> Am Montag, 13. Juli 2015, 12:26:21 schrieb Barry Smith:
> > >>>>>>>>
> > >>>>>>>> Run under valgrind first, see if it gives any more details
> about the memory issue
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> > >>>>>>>
> > >>>>>>> I tried running it like that:
> > >>>>>>>
> > >>>>>>> valgrind --tool=memcheck ./pmpi A -malloc off
> > >>>>>>>
> > >>>>>>> (pmpi is my application, no mpirun)
> > >>>>>>>
> > >>>>>>> but it reported no errors at all.
> > >>>>>>>
> > >>>>>>>> Can you send the code that produces this problem?
> > >>>>>>>
> > >>>>>>> I was not able to isolate that problem, you can of course have a
> look at our application:
> > >>>>>>>
> > >>>>>>> git clone git at github.com:precice/precice.git
> > >>>>>>> MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on
> mpi=on compiler=mpic++ build=debug
> > >>>>>>>
> > >>>>>>> The test client:
> > >>>>>>> git clone git at github.com:floli/prempi.git
> > >>>>>>> you need to adapt line 5 in SConstruct: preciceRoot
> > >>>>>>> scons
> > >>>>>>>
> > >>>>>>> Take one terminal run ./pmpi A, another to run ./pmpi B
> > >>>>>>>
> > >>>>>>> Thanks for taking a look! Mail me if any problem with the build
> occurs.
> > >>>>>>>
> > >>>>>>> Florian
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Jul 13, 2015, at 10:56 AM, Florian Lindner <
> mailinglists at xgm.de> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hello,
> > >>>>>>>>>
> > >>>>>>>>> our petsc application suffers from a memory error (double free
> or corruption).
> > >>>>>>>>>
> > >>>>>>>>> Situation is a like that:
> > >>>>>>>>>
> > >>>>>>>>> A KSP is private member of a C++ class. In its constructor I
> call KSPCreate. Inbetween it may haben that I call KSPREset. In the class'
> destructor I call KSPDestroy. That's where the memory error appears:
> > >>>>>>>>>
> > >>>>>>>>> gdb backtrace:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> #4  0x00007ffff490b8db in _int_free () from /usr/lib/libc.so.6
> > >>>>>>>>> #5  0x00007ffff6188c9c in PetscFreeAlign (ptr=0xfcd990,
> line=258, func=0x7ffff753c4c8 <__func__.20304> "KSPReset_GMRES",
> file=0x7ffff753b8b0
> "/home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c")
> > >>>>>>>>> at /home/florian/software/petsc/src/sys/memory/mal.c:72
> > >>>>>>>>> #6  0x00007ffff6ff6cdc in KSPReset_GMRES (ksp=0xf48470) at
> /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258
> > >>>>>>>>> #7  0x00007ffff70ad804 in KSPReset (ksp=0xf48470) at
> /home/florian/software/petsc/src/ksp/ksp/interface/itfunc.c:885
> > >>>>>>>>> #8  0x00007ffff70ae2e8 in KSPDestroy (ksp=0xeb89d8) at
> /home/florian/software/petsc/src/ksp/ksp/interface/itfunc.c:933
> > >>>>>>>>>
> > >>>>>>>>> #9  0x0000000000599b24 in
> precice::mapping::PetRadialBasisFctMapping<precice::mapping::Gaussian>::~PetRadialBasisFctMapping
> (this=0xeb8960) at src/mapping/PetRadialBasisFctMapping.hpp:148
> > >>>>>>>>> #10 0x0000000000599bc9 in
> precice::mapping::PetRadialBasisFctMapping<precice::mapping::Gaussian>::~PetRadialBasisFctMapping
> (this=0xeb8960) at src/mapping/PetRadialBasisFctMapping.hpp:146
> > >>>>>>>>>
> > >>>>>>>>> Complete backtrace at http://pastebin.com/ASjibeNF
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Could it be a problem it objects set by KSPSetOperators are
> destroyed afterwards? I don't think so, since KSPReset is called before.
> > >>>>>>>>>
> > >>>>>>>>> I've wrapped a class (just a bunch of helper function, no
> encapsulating wrapper) round Mat and Vec objects. Nothing fancy, the ctor
> calls MatCreate, the dtor MatDestroy, you can have a look at
> https://github.com/precice/precice/blob/develop/src/mapping/petnum.cpp /
> .hpp.
> > >>>>>>>>>
> > >>>>>>>>> These objects are also members of the same class like KSP, so
> their dtor is called after KSPDestroy.
> > >>>>>>>>>
> > >>>>>>>>> What could cause the memory corruption here?
> > >>>>>>>>>
> > >>>>>>>>> Thanks a lot,
> > >>>>>>>>> Florian
> >
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150723/2351b996/attachment-0001.html>


More information about the petsc-users mailing list