[petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs

Mark Adams mfadams at lbl.gov
Thu Nov 12 14:16:46 CST 2015


There is a valgrind for El Capitan now and I have it.  It runs perfectly
clean.
Thanks,
Mark

On Thu, Nov 12, 2015 at 11:44 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>    Thanks, I don't get any valgrind issues with this file so I have to
> conclude the valgrind issues all come from that damn Nersc machine.
>
>     I highly recommend running the application code on some linux machine
> that is suitably valgrind clean to determine if the are any memory
> corruption issues with the application code. It is insane to try to debug
> application codes on damn Nersc machines directly.
>
>    Barry
>
> > On Nov 12, 2015, at 9:35 AM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> >
> >
> > On Wed, Nov 11, 2015 at 6:14 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   Hmm, you absolutely must be using an options file otherwise it would
> never be doing all the stuff it is doing inside PetscOptionsInsertFile()!
> >
> >
> > Yes, here it is:
> >
> > -log_summary
> > #-help
> > -options_left false
> > -damping 1.15
> > -fp_trap
> > #-on_error_attach_debugger /usr/local/bin/gdb
> > #-on_error_attach_debugger /Users/markadams/homebrew/bin/gdb
> > #-start_in_debugger /Users/markadams/homebrew/bin/gdb
> > -debugger_nodes 1
> > #-malloc_debug
> > #-malloc_dump
> >
> >
> >    Please send me the options file.
> >
> >   Barry
> >
> > Most of the reports are doing to vendor crimes but it possible that the
> PetscTokenFind() code has a memory issue though I don't see how.
> >
> >   Seriously the NERSc people should be pressuring Cray to have valgrind
> clean code, this is disgraceful.
> >
> >
> > Conditional jump or move depends on uninitialised value(s)
> > ==2948==    at 0x542EC7: PetscTokenFind (str.c:965)
> > ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> > ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> > ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> > ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> > ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> > ==2948==
> > ==2948== Use of uninitialised value of size 8
> > ==2948==    at 0x542ECD: PetscTokenFind (str.c:965)
> > ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> > ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> > ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> > ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> > ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> > ==2948==
> > ==2948== Conditional jump or move depends on uninitialised value(s)
> > ==2948==    at 0x542F04: PetscTokenFind (str.c:966)
> > ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> > ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> > ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> > ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> > ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> > ==2948==
> > ==2948== Use of uninitialised value of size 8
> > ==2948==    at 0x542F0E: PetscTokenFind (str.c:967)
> > ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> > ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> > ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> > ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> > ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> > ==2948==
> > ==2948== Use of uninitialised value of size 8
> > ==2948==    at 0x542F77: PetscTokenFind (str.c:973)
> > ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> > ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> > ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> > ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> > ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> > ==2948==
> > ==2948== Use of uninitialised value of size 8
> > ==2948==    at 0x542F2D: PetscTokenFind (str.c:968)
> > ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> > ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> > ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> > ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> > ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> >
> > > On Nov 11, 2015, at 3:38 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > > These are the only PETSc params that I used:
> > >
> > > -log_summary
> > > -options_left false
> > > -fp_trap
> > >
> > > I last update about 3 weeks ago and I am on a branch.  I can redo this
> with a current master.  My repo seems to have been polluted:
> > >
> > > 13:35 edison12 master> ~/petsc$ git status
> > > # On branch master
> > > # Your branch is ahead of 'origin/master' by 262 commits.
> > > #
> > > nothing to commit (working directory clean)
> > >
> > > I trust this is OK but let me know if you would like me to clone a
> fresh repo.
> > >
> > > Mark
> > >
> > >
> > >
> > > On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > >
> > >   Thanks
> > >
> > >    do you use a petscrc file or any file with PETSc options in it for
> the run?
> > >
> > >   Thanks please send me the exact PETSc commit you are built off so I
> can see the line numbers in our source when things go bad.
> > >
> > >    Barry
> > >
> > > > On Nov 11, 2015, at 7:36 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > > >
> > > >
> > > >
> > > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > > >
> > > >   Please send me the full output. This is nuts and should be
> reported once we understand it better to NERSc as something to be fixed.
> When I pay $60 million in taxes to a computing center I expect something
> that works fine for free on my laptop to work also there.
> > > >
> > > >   Barry
> > > >
> > > > > On Nov 10, 2015, at 7:51 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > > > >
> > > > > I ran an 8 processor job on Edison of a small code for a short run
> (just a linear solve) and got 37 Mb of output!
> > > > >
> > > > > Here is a 'Petsc' grep.
> > > > >
> > > > > Perhaps we should build an ignore file for things that we believe
> is a false positive.
> > > > >
> > > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > > > >
> > > > >   I am more optimistic about valgrind than Mark. I first try
> valgrind and if that fails to be helpful then use the debugger. valgrind
> has the advantage that it finds the FIRST place that something is wrong,
> while in the debugger it is kind of late at the crash.
> > > > >
> > > > >   Valgrind should not be noisy, if it is then the
> applications/libraries should be cleaned up so that they are valgrind clean
> and then valgrind is useful.
> > > > >
> > > > >   Barry
> > > > >
> > > > >
> > > > >
> > > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > > > > >
> > > > > > BTW, I think that our advice for segv is use a debugger.  DDT or
> Totalview, and gdb if need be, will get you right to the source code and
> will get 90% of bugs diagnosed.  Valgrind is noisy and cumbersome to use
> but can diagnose 90% of the other 10%.
> > > > > >
> > > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov <
> davydden at gmail.com> wrote:
> > > > > > Hi Jose,
> > > > > >
> > > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman <jroman at dsic.upv.es>
> wrote:
> > > > > > >
> > > > > > > I am answering the SLEPc-related questions:
> > > > > > > - Having different number of iterations when changing the
> number of processes is normal.
> > > > > > the change in iterations i mentioned are for different
> preconditioners, but the same number of MPI processes.
> > > > > >
> > > > > >
> > > > > > > - Yes, if you do not destroy the EPS solver, then the
> preconditioner would be reused.
> > > > > > >
> > > > > > > Regarding the segmentation fault, I have no clue. Not sure if
> this is related to GAMG or not. Maybe running under valgrind could provide
> more information.
> > > > > > will try that.
> > > > > >
> > > > > > Denis.
> > > > > >
> > > > >
> > > > >
> > > > > <petsc_val.gz>
> > > >
> > > >
> > > > <outval.gz>
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151112/d96e8e61/attachment.html>


More information about the petsc-users mailing list