[petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs

Mark Adams mfadams at lbl.gov
Thu Nov 12 09:35:41 CST 2015


On Wed, Nov 11, 2015 at 6:14 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Hmm, you absolutely must be using an options file otherwise it would
> never be doing all the stuff it is doing inside PetscOptionsInsertFile()!
>
>
Yes, here it is:

-log_summary
#-help
-options_left false
-damping 1.15
-fp_trap
#-on_error_attach_debugger /usr/local/bin/gdb
#-on_error_attach_debugger /Users/markadams/homebrew/bin/gdb
#-start_in_debugger /Users/markadams/homebrew/bin/gdb
-debugger_nodes 1
#-malloc_debug
#-malloc_dump



>    Please send me the options file.
>
>   Barry
>
> Most of the reports are doing to vendor crimes but it possible that the
> PetscTokenFind() code has a memory issue though I don't see how.
>
>   Seriously the NERSc people should be pressuring Cray to have valgrind
> clean code, this is disgraceful.
>
>
> Conditional jump or move depends on uninitialised value(s)
> ==2948==    at 0x542EC7: PetscTokenFind (str.c:965)
> ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> ==2948==
> ==2948== Use of uninitialised value of size 8
> ==2948==    at 0x542ECD: PetscTokenFind (str.c:965)
> ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> ==2948==
> ==2948== Conditional jump or move depends on uninitialised value(s)
> ==2948==    at 0x542F04: PetscTokenFind (str.c:966)
> ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> ==2948==
> ==2948== Use of uninitialised value of size 8
> ==2948==    at 0x542F0E: PetscTokenFind (str.c:967)
> ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> ==2948==
> ==2948== Use of uninitialised value of size 8
> ==2948==    at 0x542F77: PetscTokenFind (str.c:973)
> ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
> ==2948==
> ==2948== Use of uninitialised value of size 8
> ==2948==    at 0x542F2D: PetscTokenFind (str.c:968)
> ==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
> ==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
> ==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
> ==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
> ==2948==    by 0x47B98D: main (in
> /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
>
> > On Nov 11, 2015, at 3:38 PM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > These are the only PETSc params that I used:
> >
> > -log_summary
> > -options_left false
> > -fp_trap
> >
> > I last update about 3 weeks ago and I am on a branch.  I can redo this
> with a current master.  My repo seems to have been polluted:
> >
> > 13:35 edison12 master> ~/petsc$ git status
> > # On branch master
> > # Your branch is ahead of 'origin/master' by 262 commits.
> > #
> > nothing to commit (working directory clean)
> >
> > I trust this is OK but let me know if you would like me to clone a fresh
> repo.
> >
> > Mark
> >
> >
> >
> > On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> >
> >   Thanks
> >
> >    do you use a petscrc file or any file with PETSc options in it for
> the run?
> >
> >   Thanks please send me the exact PETSc commit you are built off so I
> can see the line numbers in our source when things go bad.
> >
> >    Barry
> >
> > > On Nov 11, 2015, at 7:36 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > >
> > >
> > > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > >
> > >   Please send me the full output. This is nuts and should be reported
> once we understand it better to NERSc as something to be fixed. When I pay
> $60 million in taxes to a computing center I expect something that works
> fine for free on my laptop to work also there.
> > >
> > >   Barry
> > >
> > > > On Nov 10, 2015, at 7:51 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > > >
> > > > I ran an 8 processor job on Edison of a small code for a short run
> (just a linear solve) and got 37 Mb of output!
> > > >
> > > > Here is a 'Petsc' grep.
> > > >
> > > > Perhaps we should build an ignore file for things that we believe is
> a false positive.
> > > >
> > > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > > >
> > > >   I am more optimistic about valgrind than Mark. I first try
> valgrind and if that fails to be helpful then use the debugger. valgrind
> has the advantage that it finds the FIRST place that something is wrong,
> while in the debugger it is kind of late at the crash.
> > > >
> > > >   Valgrind should not be noisy, if it is then the
> applications/libraries should be cleaned up so that they are valgrind clean
> and then valgrind is useful.
> > > >
> > > >   Barry
> > > >
> > > >
> > > >
> > > > > On Nov 3, 2015, at 7:47 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > > > >
> > > > > BTW, I think that our advice for segv is use a debugger.  DDT or
> Totalview, and gdb if need be, will get you right to the source code and
> will get 90% of bugs diagnosed.  Valgrind is noisy and cumbersome to use
> but can diagnose 90% of the other 10%.
> > > > >
> > > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov <davydden at gmail.com>
> wrote:
> > > > > Hi Jose,
> > > > >
> > > > > > On 3 Nov 2015, at 12:20, Jose E. Roman <jroman at dsic.upv.es>
> wrote:
> > > > > >
> > > > > > I am answering the SLEPc-related questions:
> > > > > > - Having different number of iterations when changing the number
> of processes is normal.
> > > > > the change in iterations i mentioned are for different
> preconditioners, but the same number of MPI processes.
> > > > >
> > > > >
> > > > > > - Yes, if you do not destroy the EPS solver, then the
> preconditioner would be reused.
> > > > > >
> > > > > > Regarding the segmentation fault, I have no clue. Not sure if
> this is related to GAMG or not. Maybe running under valgrind could provide
> more information.
> > > > > will try that.
> > > > >
> > > > > Denis.
> > > > >
> > > >
> > > >
> > > > <petsc_val.gz>
> > >
> > >
> > > <outval.gz>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151112/dd1f1b22/attachment-0001.html>


More information about the petsc-users mailing list