[petsc-dev] TS error with optimimzied build

Mark Adams mfadams at lbl.gov
Thu Dec 15 20:06:58 CST 2016


Damn, it runs pretty clean, and the code runs, even with optimized. This
one warning goes away with -malloc_debug -malloc_dump (appended). This is
the source of the valgrind warning. This is new code with multi-species
that I just added. Am I making a mistake here?

  ierr = DMGetDS(ctx->dm, &prob);CHKERRQ(ierr);
  for (ii=0;ii<ctx->num_species;ii++) {
    char  buf[256];
    if (ii==0) ierr = PetscSNPrintf(buf, 256, "e");
    else ierr = PetscSNPrintf(buf, 256, "i%D", ii);
    CHKERRQ(ierr);
    /* Setup Discretization - FEM */
    ierr = PetscFECreateDefault(ctx->dm, dim, 1, ctx->simplex, NULL,
PETSC_DECIDE, &ctx->fe[ii]);CHKERRQ(ierr);
    ierr = LandFEChangeQuadrature(ctx->fe[ii]);CHKERRQ(ierr);
    ierr = PetscObjectSetName((PetscObject) ctx->fe[ii], buf);CHKERRQ(ierr);
    ierr = PetscDSSetDiscretization(prob, ii, (PetscObject)
ctx->fe[ii]);CHKERRQ(ierr); /* this is where Nf gets set */
  }

Thanks,

/sandbox/adams/petsc/arch-linux2-c-debug/bin/mpiexec -n 1 valgrind
--dsymutil=yes --leak-check=no --gen-suppressions=no --num-callers=20
--error-limit=no ./landaufem -dm_refine 2 -snes_rtol 5.e-8 -snes_stol 5.e-8
-ts_type theta -ts_theta_theta 0.5 -ts_theta_endpoint -theta .05 -pc_type
lu -petscspace_order 2 -mass_petscspace_order 2 -petscspace_poly_tensor 1
-mass_petscspace_poly_tensor 1 -dt 1.e-3 -tsteps 1 -verbose 2 -num_species
2 -snes_monitor -plot_file_prefix t
==26477== Memcheck, a memory error detector
==26477== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==26477== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==26477== Command: ./landaufem -dm_refine 2 -snes_rtol 5.e-8 -snes_stol
5.e-8 -ts_type theta -ts_theta_theta 0.5 -ts_theta_endpoint -theta .05
-pc_type lu -petscspace_order 2 -mass_petscspace_order 2
-petscspace_poly_tensor 1 -mass_petscspace_poly_tensor 1 -dt 1.e-3 -tsteps
1 -verbose 2 -num_species 2 -snes_monitor -plot_file_prefix t
==26477==
==26477== Thread 9:
==26477== Invalid read of size 16
==26477==    at 0x9376634: dswap_k_SANDYBRIDGE (in
/usr/lib/openblas-base/libblas.so.3)
==26477==    by 0x835AE60: ??? (in /usr/lib/openblas-base/libblas.so.3)
==26477==    by 0x835B14C: ??? (in /usr/lib/openblas-base/libblas.so.3)
==26477==    by 0xA100183: start_thread (pthread_create.c:312)
==26477==    by 0x751837C: clone (clone.S:111)
==26477==  Address 0x93bece50 is 640 bytes inside a block of size 648
alloc'd
==26477==    at 0x4C2D110: memalign (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26477==    by 0x4ED5959: PetscMallocAlign (mal.c:28)
==26477==    by 0x5CD12B0: PetscFESetUp_Basic (dtfe.c:3847)
==26477==    by 0x5CC69ED: PetscFESetUp (dtfe.c:3094)
==26477==    by 0x5CEBB70: PetscFECreateDefault (dtfe.c:6606)
==26477==    by 0x40DEB1: main (main.c:775)
==26477==
 *** LandFEChangeQuadrature No hack tensor=1 Nq=9 dim=2 ***
 *** LandFEChangeQuadrature No hack tensor=1 Nq=9 dim=2 ***
 *** LandFEChangeQuadrature No hack tensor=1 Nq=9 dim=2 ***
 *** LandFEChangeQuadrature No hack tensor=1 Nq=9 dim=2 ***
  0) species 0: momentum= -3.53244532544986e-02, energy=
 1.51054814051742e-02
  0) species 1: momentum=  7.06489065089972e-02, energy=
 3.02109628103485e-02
          0) Total momentum=  3.53244532544986e-02, energy=
 4.53164442155227e-02 (m_i/m_e = 2.)
    0 SNES Function norm 4.438513482798e+01
    1 SNES Function norm 1.644115258674e+00
    2 SNES Function norm 1.246811010563e-01
    3 SNES Function norm 1.109117454301e-02
    4 SNES Function norm 1.006853809120e-03
    5 SNES Function norm 9.334138459020e-05
    6 SNES Function norm 8.543152498171e-06
    7 SNES Function norm 7.831310487414e-07
  1) species 0: momentum= -3.18205425912776e-02, energy=
 1.57071588338379e-02
  1) species 1: momentum=  6.71449958176972e-02, energy=
 2.96092852424931e-02
          1) Total momentum=  3.53244532264196e-02, energy=
 4.53164440763310e-02 (m_i/m_e = 2.)
1 TS time steps, 32 cells, Nq=9 (288 IPs), T=0.001
==26477==
==26477== HEAP SUMMARY:
==26477==     in use at exit: 0 bytes in 0 blocks
==26477==   total heap usage: 8,618 allocs, 8,556 frees, 6,754,155 bytes
allocated
==26477==
==26477== For a detailed leak analysis, rerun with: --leak-check=full
==26477==
==26477== For counts of detected and suppressed errors, rerun with: -v
==26477== ERROR SUMMARY: 12 errors from 1 contexts (suppressed: 0 from 0)


On Thu, Dec 15, 2016 at 8:10 PM, Mark Adams <mfadams at lbl.gov> wrote:

> Great, thanks, all installed and configuring.
>
> Humm, and I forgot to do PETSC_DIR=$PWD ...
>
> On Thu, Dec 15, 2016 at 6:43 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>    Mark,
>>
>>     My records indicate you have accounts at MCS, at least we are paying
>> for them :-). You should be able to access them
>> with
>>
>> ssh adams at login.mcs.anl.gov
>>
>> then
>>
>> ssh cg
>>
>> then
>>
>> cd /sandbox/
>> mkdir  adams
>> cd adams
>> git clone git at bitbucket.org:petsc/petsc.git
>> cd petsc
>> ./configure --download-mpich
>>
>> If you never set your ssh key to login then you need to do it at
>> accounts.mcs.anl.gov (note that you cannot ssh via a password you need
>> to set the ssh key).
>>
>> Barry
>>
>> > On Dec 15, 2016, at 5:32 PM, Mark Adams <mfadams at lbl.gov> wrote:
>> >
>> > OK, that was useful. I don't have access to a Linux machine right now.
>> I must have an uninitialized variable (that the compiler did not catch).
>> > Thanks,
>> >
>> > [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> > [0]PETSC ERROR:
>> > [0]PETSC ERROR: SNESSolve has not converged due to Nan or Inf norm
>> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-2475-g4fbd04f
>> GIT Date: 2016-12-11 10:34:33 -0500
>> > [0]PETSC ERROR: /global/u2/m/madams/landaufem/Plex/./landaufem on a
>> arch-xc30-opt64-intel named nid00012 by madams Thu Dec 15 15:31:15 2016
>> > [0]PETSC ERROR: Configure options COPTFLAGS="-fast -no-ipo -g"
>> CXXOPTFLAGS="-fast -no-ipo -g" FOPTFLAGS="-fast -no-ipo -g"
>> --download-hypre --download-parmetis --download-metis --download-p4est
>> --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.16/INTEL/15.0 --with-ssl=0
>> --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC
>> --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn
>> --with-fortranlib-autodetect=0 --with-shared-libraries=0 --with-x=0
>> --with-mpiexec=srun LIBS=-lstdc++ --with-64-bit-indices
>> PETSC_ARCH=arch-xc30-opt64-intel
>> > [0]PETSC ERROR: #1 SNESSolve_NEWTONLS() line 186 in
>> /global/u2/m/madams/petsc/src/snes/impls/ls/ls.c
>> > TSSolve failed
>> > 0 TS time steps, 528 cells, Nq=7 (3696 IPs), T=0.
>> > [0]PETSC ERROR: #2 SNESSolve() line 4128 in
>> /global/u2/m/madams/petsc/src/snes/interface/snes.c
>> > [0]PETSC ERROR: #3 TS_SNESSolve() line 189 in
>> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c
>> >
>> >
>> > On Thu, Dec 15, 2016 at 6:24 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> >    It is useful to run with valgrind, even on a completely different
>> machine, when you have errors because it will detect any memory corruption.
>> So currently I do my valgrind runs on a linux machine.
>> >
>> >    Assuming the prefix is correct -snes_monitor should print an initial
>> residual norm before doing any solves so it is curious you got no output.
>> You can run with -snes_error_if_not_converged -ksp_error_if_not_converged
>> to try to get it to output as soon as a problem is detected.
>> >
>> >    Barry
>> >
>> >
>> > > On Dec 15, 2016, at 4:56 PM, Mark Adams <mfadams at lbl.gov> wrote:
>> > >
>> > > I have a code that work on my Mac but it fails on both a Cray XC30
>> and a KNL, unless the code is build with debug. I get this error message +
>> -info output. I am using -snes_monitor but get no output. This code was
>> working and I added a new feature. It does seem to fail when this new
>> feature is used.
>> > >
>> > > And, alas, I do not have a functioning valgrind right now.
>> > >
>> > > I will start toggling optimization flags but any thoughts would be
>> welcome.
>> > >
>> > > Mark
>> > >
>> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 4290 X 4290; storage space:
>> 0 unneeded,132612 used
>> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>> is 0
>> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 50
>> > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 0)/(num_localrows 4290) < 0.6. Do not use CompressedRow routines.
>> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
>> 1140850689 -2080374784
>> > > [0] DMGetDMKSP(): Creating new DMKSP
>> > > [0] TSAdaptCheckStage(): Step=0, nonlinear solve failures 1 greater
>> than current TS allowed, stopping solve
>> > > [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> > > [0]PETSC ERROR:
>> > > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE,
>> increase -ts_max_snes_failures or make negative to attempt recovery
>> > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d
>> ocumentation/faq.html for trouble shooting.
>> > > [0]PETSC ERROR: Petsc Development GIT revision: unknown  GIT Date:
>> unknown
>> > > [0]PETSC ERROR: /global/u2/m/madams/landaufem/Plex/./landaufem on a
>> arch-cori-knl-opt64-novector-intel named nid09355 by madams Thu Dec 15
>> 14:25:00 2016
>> > > [0]PETSC ERROR: TSSolve failed
>> > > 1 TS time steps, 512 cells, Nq=9 (4608 IPs), T=0.
>> > > Configure options COPTFLAGS="  -g -O1 -fp-model fast -qopt-report=5
>> -hcpu=mic-knl -no-simd" CXXOPTFLAGS="-g -O1 -fp-model fast -qopt-report=5
>> -hcpu=mic-knl -no-simd" FOPTFLAGS="  -g -O1 -fp-model fast -qopt-report=5
>> -hcpu=mic-knl -no-simd" --download-metis=1 --download-parmetis=1
>> --with-blas-lapack-dir=/usr/common/software/intel/compilers_
>> and_libraries_2016.3.210/linux/mkl --with-cc=mpiicc --with-cxx=mpiicpc
>> --with-debugging=0 --with-fc=mpiifort --with-mpiexec=srun --with-batch=0
>> --with-memalign=64 --with-64-bit-indices PETSC_ARCH=arch-cori-knl-opt64-novector-intel
>> --with-openmp=0 --download-p4est=0
>> > > [0]PETSC ERROR: #1 TSStep() line 3972 in
>> /global/u2/m/madams/petsc/src/ts/interface/ts.c
>> > > [0]PETSC ERROR: #2 TSSolve() line 4218 in
>> /global/u2/m/madams/petsc/src/ts/interface/ts.c
>> > >
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20161215/91678735/attachment.html>


More information about the petsc-dev mailing list