[petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs

Barry Smith bsmith at mcs.anl.gov
Wed Nov 11 17:14:53 CST 2015


  Hmm, you absolutely must be using an options file otherwise it would never be doing all the stuff it is doing inside PetscOptionsInsertFile()!

   Please send me the options file.

  Barry

Most of the reports are doing to vendor crimes but it possible that the PetscTokenFind() code has a memory issue though I don't see how.

  Seriously the NERSc people should be pressuring Cray to have valgrind clean code, this is disgraceful.


Conditional jump or move depends on uninitialised value(s)
==2948==    at 0x542EC7: PetscTokenFind (str.c:965)
==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
==2948==    by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
==2948== 
==2948== Use of uninitialised value of size 8
==2948==    at 0x542ECD: PetscTokenFind (str.c:965)
==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
==2948==    by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
==2948== 
==2948== Conditional jump or move depends on uninitialised value(s)
==2948==    at 0x542F04: PetscTokenFind (str.c:966)
==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
==2948==    by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
==2948== 
==2948== Use of uninitialised value of size 8
==2948==    at 0x542F0E: PetscTokenFind (str.c:967)
==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
==2948==    by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
==2948== 
==2948== Use of uninitialised value of size 8
==2948==    at 0x542F77: PetscTokenFind (str.c:973)
==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
==2948==    by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)
==2948== 
==2948== Use of uninitialised value of size 8
==2948==    at 0x542F2D: PetscTokenFind (str.c:968)
==2948==    by 0x4F00B9: PetscOptionsInsertString (options.c:390)
==2948==    by 0x4F2F7B: PetscOptionsInsertFile (options.c:590)
==2948==    by 0x4F4ED7: PetscOptionsInsert (options.c:721)
==2948==    by 0x51A629: PetscInitialize (pinit.c:859)
==2948==    by 0x47B98D: main (in /global/u2/m/madams/hpsr/src/hpsr.arch-xc30-dbg-intel.ex)

> On Nov 11, 2015, at 3:38 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> These are the only PETSc params that I used:
> 
> -log_summary
> -options_left false
> -fp_trap
> 
> I last update about 3 weeks ago and I am on a branch.  I can redo this with a current master.  My repo seems to have been polluted:
> 
> 13:35 edison12 master> ~/petsc$ git status
> # On branch master
> # Your branch is ahead of 'origin/master' by 262 commits.
> #
> nothing to commit (working directory clean)
> 
> I trust this is OK but let me know if you would like me to clone a fresh repo.
> 
> Mark
> 
> 
> 
> On Wed, Nov 11, 2015 at 11:21 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>   Thanks
> 
>    do you use a petscrc file or any file with PETSc options in it for the run?
> 
>   Thanks please send me the exact PETSc commit you are built off so I can see the line numbers in our source when things go bad.
> 
>    Barry
> 
> > On Nov 11, 2015, at 7:36 AM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> >
> >
> > On Tue, Nov 10, 2015 at 11:15 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   Please send me the full output. This is nuts and should be reported once we understand it better to NERSc as something to be fixed. When I pay $60 million in taxes to a computing center I expect something that works fine for free on my laptop to work also there.
> >
> >   Barry
> >
> > > On Nov 10, 2015, at 7:51 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > > I ran an 8 processor job on Edison of a small code for a short run (just a linear solve) and got 37 Mb of output!
> > >
> > > Here is a 'Petsc' grep.
> > >
> > > Perhaps we should build an ignore file for things that we believe is a false positive.
> > >
> > > On Tue, Nov 3, 2015 at 11:55 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > >   I am more optimistic about valgrind than Mark. I first try valgrind and if that fails to be helpful then use the debugger. valgrind has the advantage that it finds the FIRST place that something is wrong, while in the debugger it is kind of late at the crash.
> > >
> > >   Valgrind should not be noisy, if it is then the applications/libraries should be cleaned up so that they are valgrind clean and then valgrind is useful.
> > >
> > >   Barry
> > >
> > >
> > >
> > > > On Nov 3, 2015, at 7:47 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > > >
> > > > BTW, I think that our advice for segv is use a debugger.  DDT or Totalview, and gdb if need be, will get you right to the source code and will get 90% of bugs diagnosed.  Valgrind is noisy and cumbersome to use but can diagnose 90% of the other 10%.
> > > >
> > > > On Tue, Nov 3, 2015 at 7:32 AM, Denis Davydov <davydden at gmail.com> wrote:
> > > > Hi Jose,
> > > >
> > > > > On 3 Nov 2015, at 12:20, Jose E. Roman <jroman at dsic.upv.es> wrote:
> > > > >
> > > > > I am answering the SLEPc-related questions:
> > > > > - Having different number of iterations when changing the number of processes is normal.
> > > > the change in iterations i mentioned are for different preconditioners, but the same number of MPI processes.
> > > >
> > > >
> > > > > - Yes, if you do not destroy the EPS solver, then the preconditioner would be reused.
> > > > >
> > > > > Regarding the segmentation fault, I have no clue. Not sure if this is related to GAMG or not. Maybe running under valgrind could provide more information.
> > > > will try that.
> > > >
> > > > Denis.
> > > >
> > >
> > >
> > > <petsc_val.gz>
> >
> >
> > <outval.gz>
> 
> 



More information about the petsc-users mailing list