[petsc-dev] Did someone fucking break bfort?

Satish Balay balay at mcs.anl.gov
Fri Dec 25 14:11:46 CST 2009


Are you sure bfort got rebuilt witht he change? I cant get this valgrind output..

>>>>>>.
asterix:/home/balay/tmp/spetsc/src/mat/utils>valgrind --tool=memcheck bfort -dir `pwd`/ftn-auto -mnative -ansi -nomsgs -noprofile -anyname -mapptr -mpi -mpi2 -ferr -ptrprefix Petsc -ptr64 PETSC_USE_POINTER_CONVERSION -fcaps PETSC_HAVE_FORTRAN_CAPS -fuscore PETSC_HAVE_FORTRAN_UNDERSCORE -f90mod_skip_header matio.c convert.c gcreate.c freespace.c getcolv.c ptap.c compressedrow.c matstash.c multequal.c axpy.c freespace.h zerodiag.c matstashspace.c
==2289== Memcheck, a memory error detector
==2289== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==2289== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==2289== Command: bfort -dir /home/balay/tmp/spetsc/src/mat/utils/ftn-auto -mnative -ansi -nomsgs -noprofile -anyname -mapptr -mpi -mpi2 -ferr -ptrprefix Petsc -ptr64 PETSC_USE_POINTER_CONVERSION -fcaps PETSC_HAVE_FORTRAN_CAPS -fuscore PETSC_HAVE_FORTRAN_UNDERSCORE -f90mod_skip_header matio.c convert.c gcreate.c freespace.c getcolv.c ptap.c compressedrow.c matstash.c multequal.c axpy.c freespace.h zerodiag.c matstashspace.c
==2289== 
==2289== 
==2289== HEAP SUMMARY:
==2289==     in use at exit: 5,093 bytes in 50 blocks
==2289==   total heap usage: 71 allocs, 21 frees, 17,021 bytes allocated
==2289== 
==2289== LEAK SUMMARY:
==2289==    definitely lost: 5,093 bytes in 50 blocks
==2289==    indirectly lost: 0 bytes in 0 blocks
==2289==      possibly lost: 0 bytes in 0 blocks
==2289==    still reachable: 0 bytes in 0 blocks
==2289==         suppressed: 0 bytes in 0 blocks
==2289== Rerun with --leak-check=full to see details of leaked memory
==2289== 
==2289== For counts of detected and suppressed errors, rerun with: -v
==2289== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)
asterix:/home/balay/tmp/spetsc/src/mat/utils>
<<<<<<<<<<

I'll send in the bugrepot to Bill later today - as it gets toggled with the
following change

http://petsc.cs.iit.edu/petsc/externalpackages/sowing-1.1.11/rev/e591c037e500

But regarding your machine - somthing changed on it a couple of days
back - thats triggering this issue. You haven't mentioned how I can
reporduce it ['stack smashing detected' mesg etc..]

Satish

On Fri, 25 Dec 2009, Matthew Knepley wrote:

> Here is the valgrind for your +100 fix:
> 
> knepley at khan:/PETSc3/petsc/petsc-dev/src/mat/utils$ valgrind
> /PETSc3/petsc/petsc-dev/linux-gnu-cxx-debug/bin/bfort -dir
> /PETSc3/petsc/petsc-dev/src/mat/utils/ftn-auto -mnative -ansi -nomsgs
> -noprofile -anyname -mapptr -mpi -mpi2 -ferr -ptrprefix Petsc -ptr64
> PETSC_USE_POINTER_CONVERSION -fcaps PETSC_HAVE_FORTRAN_CAPS -fuscore
> PETSC_HAVE_FORTRAN_UNDERSCORE -f90mod_skip_header matio.c convert.c
> gcreate.c freespace.c getcolv.c ptap.c compressedrow.c matstash.c
> multequal.c axpy.c freespace.h zerodiag.c matstashspace.c
> ==20868== Memcheck, a memory error detector.
> ==20868== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
> ==20868== Using LibVEX rev 1804, a library for dynamic binary translation.
> ==20868== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
> ==20868== Using valgrind-3.3.0-Debian, a dynamic binary instrumentation
> framework.
> ==20868== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
> ==20868== For more details, rerun with: -v
> ==20868==
> ==20868== Conditional jump or move depends on uninitialised value(s)
> ==20868==    at 0x804C2BB: PrintBody (bfort.c:1362)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==
> ==20868== Conditional jump or move depends on uninitialised value(s)
> ==20868==    at 0x804C293: PrintBody (bfort.c:1363)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==
> ==20868== Conditional jump or move depends on uninitialised value(s)
> ==20868==    at 0x804C5E7: PrintBody (bfort.c:1384)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==
> ==20868== Conditional jump or move depends on uninitialised value(s)
> ==20868==    at 0x804C396: PrintBody (bfort.c:1385)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==
> ==20868== Conditional jump or move depends on uninitialised value(s)
> ==20868==    at 0x804C3CE: PrintBody (bfort.c:1387)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==
> ==20868== Conditional jump or move depends on uninitialised value(s)
> ==20868==    at 0x804C3E9: PrintBody (bfort.c:1387)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==
> ==20868== Conditional jump or move depends on uninitialised value(s)
> ==20868==    at 0x804C589: PrintBody (bfort.c:1406)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==
> ==20868== Use of uninitialised value of size 4
> ==20868==    at 0x40239D8: strlen (mc_replace_strmem.c:242)
> ==20868==    by 0x4198127: fputs (in /lib/tls/i686/cmov/libc-2.7.so)
> ==20868==    by 0x804C5BE: PrintBody (bfort.c:1408)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==
> ==20868== Invalid read of size 1
> ==20868==    at 0x40239D8: strlen (mc_replace_strmem.c:242)
> ==20868==    by 0x4198127: fputs (in /lib/tls/i686/cmov/libc-2.7.so)
> ==20868==    by 0x804C5BE: PrintBody (bfort.c:1408)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==20868==
> ==20868== Process terminating with default action of signal 11 (SIGSEGV)
> ==20868==  Access not within mapped region at address 0x0
> ==20868==    at 0x40239D8: strlen (mc_replace_strmem.c:242)
> ==20868==    by 0x4198127: fputs (in /lib/tls/i686/cmov/libc-2.7.so)
> ==20868==    by 0x804C5BE: PrintBody (bfort.c:1408)
> ==20868==    by 0x804A622: OutputRoutine (bfort.c:575)
> ==20868==    by 0x804A0B2: main (bfort.c:475)
> ==20868==
> ==20868== ERROR SUMMARY: 72 errors from 9 contexts (suppressed: 17 from 1)
> ==20868== malloc/free: in use at exit: 1,056 bytes in 3 blocks.
> ==20868== malloc/free: 6 allocs, 3 frees, 2,112 bytes allocated.
> ==20868== For counts of detected errors, rerun with: -v
> ==20868== searching for pointers to 3 not-freed blocks.
> ==20868== checked 243,864 bytes.
> ==20868==
> ==20868== LEAK SUMMARY:
> ==20868==    definitely lost: 0 bytes in 0 blocks.
> ==20868==      possibly lost: 0 bytes in 0 blocks.
> ==20868==    still reachable: 1,056 bytes in 3 blocks.
> ==20868==         suppressed: 0 bytes in 0 blocks.
> ==20868== Rerun with --leak-check=full to see details of leaked memory.
> Segmentation fault
> 
> The problem is that argument lists are just not parsed correctly for
> gcreate.c. You can send that to Bill.
> 
>   Matt
> 
> On Fri, Dec 25, 2009 at 10:55 AM, Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > One more thing. If I remove this patch from sowing - the valgrind log
> > is clean.
> >
> >
> > http://petsc.cs.iit.edu/petsc/externalpackages/sowing-1.1.11/rev/e591c037e500
> >
> > Perhaps you can find the bug in this change. If not - I'll send a bug
> > report to Bill.
> >
> > Satish
> >
> > On Fri, 25 Dec 2009, Satish Balay wrote:
> >
> > > Can you send me the valgrind.log - with the patch applied to the
> > > unmodified sowing-1.1.11-a.tar.gz?
> > >
> > > Also the command you are using to generate this log?
> > >
> > >
> > > I've used the following:
> > > valgrind --tool=memcheck -q --log-file=valgrind.log bfort -dir
> > `pwd`/ftn-auto -ansi -nomsgs -noprofile -anyname -mapptr -mpi -mpi2 -ferr
> > -ptrprefix Petsc -ptr64 PETSC_USE_POINTER_CONVERSION -fcaps
> > PETSC_HAVE_FORTRAN_CAPS -fuscore PETSC_HAVE_FORTRAN_UNDERSCORE matrix.c
> > >
> > > Satish
> > >
> > > On Fri, 25 Dec 2009, Matthew Knepley wrote:
> > >
> > > > Valgrind is not clean for me with the change.
> > > >
> > > >   Matt
> > > >
> > > > On Fri, Dec 25, 2009 at 10:28 AM, Satish Balay <balay at mcs.anl.gov>
> > wrote:
> > > >
> > > > > Well - normally the first step with detecting the bugs is to report
> > > > > them to the author - and ask for a fix..
> > > > >
> > > > > Satish
> > > > >
> > > > > On Fri, 25 Dec 2009, Matthew Knepley wrote:
> > > > >
> > > > > > I can try, but I still think replacement is the only real
> > alternative.
> > > > > This
> > > > > > is not
> > > > > > able to be debugged, or you would not recommend sticking in random
> > > > > numbers
> > > > > > in malloc() and I would be able to see where an SEGV occurs with
> > gdb.
> > > > > >
> > > > > >   Matt
> > > > > >
> > > > > > On Fri, Dec 25, 2009 at 10:16 AM, Satish Balay <balay at mcs.anl.gov>
> > > > > wrote:
> > > > > >
> > > > > > > BTW: What linux are you using? ubuntu version? i686 or x86_64?
> > etc...
> > > > > > >
> > > > > > > also try:
> > > > > > >
> > > > > > > arg->name     = (char *)MALLOC( strlen(p) + 100 );
> > > > > > >
> > > > > > > satish
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 25 Dec 2009, Satish Balay wrote:
> > > > > > >
> > > > > > > > Did my suggested change not work for you?
> > > > > > > >
> > > > > > > > Satish
> > > > > > > >
> > > > > > > > On Thu, 24 Dec 2009, Matthew Knepley wrote:
> > > > > > > >
> > > > > > > > > I spent a bunch of time on this today. This shit is
> > hopelessly
> > > > > broken.
> > > > > > > It
> > > > > > > > > sucks completely.
> > > > > > > > > I cannot get it to run, nor see why it is causing stack
> > overruns
> > > > > and
> > > > > > > SEGVs.
> > > > > > > > > If anyone does
> > > > > > > > > not think it is hopeless, speak up now. This is a complete
> > fucking
> > > > > > > > > embarrassment.
> > > > > > > > >
> > > > > > > > >    Matt
> > > > > > > > >
> > > > > > > > > On Mon, Dec 21, 2009 at 4:42 PM, Matthew Knepley <
> > > > > knepley at gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > This does not make any sense to me because it would be a
> > heap
> > > > > > > violation,
> > > > > > > > > > not a stack smash.
> > > > > > > > > >
> > > > > > > > > >   Matt
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 21, 2009 at 4:30 PM, Satish Balay <
> > balay at mcs.anl.gov
> > > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> [I don't know the correct fix for this - but ] The
> > following
> > > > > change
> > > > > > > is
> > > > > > > > > >> getting rid of valgrind messages for me. Maybe you can use
> > this,
> > > > > > > build
> > > > > > > > > >> sowing separately - and continue..
> > > > > > > > > >>
> > > > > > > > > >> Satish
> > > > > > > > > >>
> > > > > > > > > >> ----------
> > > > > > > > > >>
> > > > > > > > > >> diff -r dbe25084c0e4 src/bfort/bfort.c
> > > > > > > > > >> --- a/src/bfort/bfort.c Mon Dec 15 22:20:58 2008 -0600
> > > > > > > > > >> +++ b/src/bfort/bfort.c Mon Dec 21 16:29:09 2009 -0600
> > > > > > > > > >> @@ -2157,7 +2157,7 @@
> > > > > > > > > >>
> > > > > > > > > >>     /* Current token is name */
> > > > > > > > > >>     arg->has_star = (nstar > 0);
> > > > > > > > > >> -    arg->name     = (char *)MALLOC( strlen(p) + 1 );
> > > > > > > > > >> +    arg->name     = (char *)MALLOC( strlen(p) + 10 );
> > > > > > > > > >>     strcpy( arg->name, p );
> > > > > > > > > >>
> > > > > > > > > >>     /* We can't output the name just yet, because if it is
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Mon, 21 Dec 2009, Matthew Knepley wrote:
> > > > > > > > > >>
> > > > > > > > > >> > The problem appears to be in OutputRoutine() in bfort.c,
> > but
> > > > > that
> > > > > > > code
> > > > > > > > > >> is
> > > > > > > > > >> > impossible
> > > > > > > > > >> > to debug. I can't see where something is getting
> > overwritten,
> > > > > and
> > > > > > > it
> > > > > > > > > >> looks
> > > > > > > > > >> > like the check
> > > > > > > > > >> > only happens when the routine returns. bfort is such
> > crap.
> > > > > > > > > >> >
> > > > > > > > > >> >   Matt
> > > > > > > > > >> >
> > > > > > > > > >> > On Mon, Dec 21, 2009 at 3:25 PM, Matthew Knepley <
> > > > > > > knepley at gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > On Mon, Dec 21, 2009 at 3:21 PM, Satish Balay <
> > > > > > > balay at mcs.anl.gov>
> > > > > > > > > >> wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > >> On Mon, 21 Dec 2009, Lisandro Dalcín wrote:
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> > On Mon, Dec 21, 2009 at 5:37 PM, Matthew Knepley <
> > > > > > > > > >> knepley at gmail.com>
> > > > > > > > > >> > >> wrote:
> > > > > > > > > >> > >> > >
> > > > > > > > > >> > >> > > It says there is a stack smash and no other info.
> > This
> > > > > is
> > > > > > > > > >> completely
> > > > > > > > > >> > >> fucking
> > > > > > > > > >> > >> > > my development right now.
> > > > > > > > > >> > >> > >
> > > > > > > > > >> > >> >
> > > > > > > > > >> > >> > Any chance bfort was built with -fstack-protector
> > flag?
> > > > > This
> > > > > > > > > >> failure
> > > > > > > > > >> > >> > could could be signaling an actual old bug in
> > bfort... I
> > > > > > > would
> > > > > > > > > >> > >> > re-build bfort with debug and re-run under
> > valgrind...
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> That must be it.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> I just ran my build [which is without
> > -fstack-protector] -
> > > > > and
> > > > > > > > > >> > >> valgrind does flag a bunch of things with bfort.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >
> > > > > > > > > >> > > 1) That flag is nowhere in my build.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2) Something changed
> > > > > > > > > >> > >
> > > > > > > > > >> > >   Matt
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >> I normally install sowing separately and have it in
> > my PATH
> > > > > -
> > > > > > > so that
> > > > > > > > > >> > >> it doesn't have to be rebuilt each time I build
> > petsc.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> I guess we should sync up [our patches] with latest
> > sowing
> > > > > and
> > > > > > > make
> > > > > > > > > >> > >> sure its valgrind clean aswell.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Satish
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > --
> > > > > > > > > >> > > What most experimenters take for granted before they
> > begin
> > > > > their
> > > > > > > > > >> > > experiments is infinitely more interesting than any
> > results
> > > > > to
> > > > > > > which
> > > > > > > > > >> their
> > > > > > > > > >> > > experiments lead.
> > > > > > > > > >> > > -- Norbert Wiener
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > What most experimenters take for granted before they begin
> > their
> > > > > > > > > > experiments is infinitely more interesting than any results
> > to
> > > > > which
> > > > > > > their
> > > > > > > > > > experiments lead.
> > > > > > > > > > -- Norbert Wiener
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> 
> 
> 
> 


More information about the petsc-dev mailing list