[petsc-dev] TS error with optimimzied build

Mark Adams mfadams at lbl.gov
Thu Dec 15 20:18:14 CST 2016


 I found the bug. This Linux compiler did give me a warning about a
possible uninitialized variable that is used for normalization, with a
slightly convoluted variable initialization. Apparently some compilers
reorganized this loop and stuff was normalized with an uninitialized
variable.

Anyway, thanks again for getting me setup on cg.
Mark

On Thu, Dec 15, 2016 at 9:06 PM, Mark Adams <mfadams at lbl.gov> wrote:

> Damn, it runs pretty clean, and the code runs, even with optimized. This
> one warning goes away with -malloc_debug -malloc_dump (appended). This is
> the source of the valgrind warning. This is new code with multi-species
> that I just added. Am I making a mistake here?
>
>   ierr = DMGetDS(ctx->dm, &prob);CHKERRQ(ierr);
>   for (ii=0;ii<ctx->num_species;ii++) {
>     char  buf[256];
>     if (ii==0) ierr = PetscSNPrintf(buf, 256, "e");
>     else ierr = PetscSNPrintf(buf, 256, "i%D", ii);
>     CHKERRQ(ierr);
>     /* Setup Discretization - FEM */
>     ierr = PetscFECreateDefault(ctx->dm, dim, 1, ctx->simplex, NULL,
> PETSC_DECIDE, &ctx->fe[ii]);CHKERRQ(ierr);
>     ierr = LandFEChangeQuadrature(ctx->fe[ii]);CHKERRQ(ierr);
>     ierr = PetscObjectSetName((PetscObject) ctx->fe[ii],
> buf);CHKERRQ(ierr);
>     ierr = PetscDSSetDiscretization(prob, ii, (PetscObject)
> ctx->fe[ii]);CHKERRQ(ierr); /* this is where Nf gets set */
>   }
>
> Thanks,
>
> /sandbox/adams/petsc/arch-linux2-c-debug/bin/mpiexec -n 1 valgrind
> --dsymutil=yes --leak-check=no --gen-suppressions=no --num-callers=20
> --error-limit=no ./landaufem -dm_refine 2 -snes_rtol 5.e-8 -snes_stol 5.e-8
> -ts_type theta -ts_theta_theta 0.5 -ts_theta_endpoint -theta .05 -pc_type
> lu -petscspace_order 2 -mass_petscspace_order 2 -petscspace_poly_tensor 1
> -mass_petscspace_poly_tensor 1 -dt 1.e-3 -tsteps 1 -verbose 2 -num_species
> 2 -snes_monitor -plot_file_prefix t
> ==26477== Memcheck, a memory error detector
> ==26477== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
> ==26477== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright
> info
> ==26477== Command: ./landaufem -dm_refine 2 -snes_rtol 5.e-8 -snes_stol
> 5.e-8 -ts_type theta -ts_theta_theta 0.5 -ts_theta_endpoint -theta .05
> -pc_type lu -petscspace_order 2 -mass_petscspace_order 2
> -petscspace_poly_tensor 1 -mass_petscspace_poly_tensor 1 -dt 1.e-3 -tsteps
> 1 -verbose 2 -num_species 2 -snes_monitor -plot_file_prefix t
> ==26477==
> ==26477== Thread 9:
> ==26477== Invalid read of size 16
> ==26477==    at 0x9376634: dswap_k_SANDYBRIDGE (in /usr/lib/openblas-base/
> libblas.so.3)
> ==26477==    by 0x835AE60: ??? (in /usr/lib/openblas-base/libblas.so.3)
> ==26477==    by 0x835B14C: ??? (in /usr/lib/openblas-base/libblas.so.3)
> ==26477==    by 0xA100183: start_thread (pthread_create.c:312)
> ==26477==    by 0x751837C: clone (clone.S:111)
> ==26477==  Address 0x93bece50 is 640 bytes inside a block of size 648
> alloc'd
> ==26477==    at 0x4C2D110: memalign (in /usr/lib/valgrind/vgpreload_
> memcheck-amd64-linux.so)
> ==26477==    by 0x4ED5959: PetscMallocAlign (mal.c:28)
> ==26477==    by 0x5CD12B0: PetscFESetUp_Basic (dtfe.c:3847)
> ==26477==    by 0x5CC69ED: PetscFESetUp (dtfe.c:3094)
> ==26477==    by 0x5CEBB70: PetscFECreateDefault (dtfe.c:6606)
> ==26477==    by 0x40DEB1: main (main.c:775)
> ==26477==
>  *** LandFEChangeQuadrature No hack tensor=1 Nq=9 dim=2 ***
>  *** LandFEChangeQuadrature No hack tensor=1 Nq=9 dim=2 ***
>  *** LandFEChangeQuadrature No hack tensor=1 Nq=9 dim=2 ***
>  *** LandFEChangeQuadrature No hack tensor=1 Nq=9 dim=2 ***
>   0) species 0: momentum= -3.53244532544986e-02, energy=
>  1.51054814051742e-02
>   0) species 1: momentum=  7.06489065089972e-02, energy=
>  3.02109628103485e-02
>           0) Total momentum=  3.53244532544986e-02, energy=
>  4.53164442155227e-02 (m_i/m_e = 2.)
>     0 SNES Function norm 4.438513482798e+01
>     1 SNES Function norm 1.644115258674e+00
>     2 SNES Function norm 1.246811010563e-01
>     3 SNES Function norm 1.109117454301e-02
>     4 SNES Function norm 1.006853809120e-03
>     5 SNES Function norm 9.334138459020e-05
>     6 SNES Function norm 8.543152498171e-06
>     7 SNES Function norm 7.831310487414e-07
>   1) species 0: momentum= -3.18205425912776e-02, energy=
>  1.57071588338379e-02
>   1) species 1: momentum=  6.71449958176972e-02, energy=
>  2.96092852424931e-02
>           1) Total momentum=  3.53244532264196e-02, energy=
>  4.53164440763310e-02 (m_i/m_e = 2.)
> 1 TS time steps, 32 cells, Nq=9 (288 IPs), T=0.001
> ==26477==
> ==26477== HEAP SUMMARY:
> ==26477==     in use at exit: 0 bytes in 0 blocks
> ==26477==   total heap usage: 8,618 allocs, 8,556 frees, 6,754,155 bytes
> allocated
> ==26477==
> ==26477== For a detailed leak analysis, rerun with: --leak-check=full
> ==26477==
> ==26477== For counts of detected and suppressed errors, rerun with: -v
> ==26477== ERROR SUMMARY: 12 errors from 1 contexts (suppressed: 0 from 0)
>
>
> On Thu, Dec 15, 2016 at 8:10 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>> Great, thanks, all installed and configuring.
>>
>> Humm, and I forgot to do PETSC_DIR=$PWD ...
>>
>> On Thu, Dec 15, 2016 at 6:43 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>>    Mark,
>>>
>>>     My records indicate you have accounts at MCS, at least we are paying
>>> for them :-). You should be able to access them
>>> with
>>>
>>> ssh adams at login.mcs.anl.gov
>>>
>>> then
>>>
>>> ssh cg
>>>
>>> then
>>>
>>> cd /sandbox/
>>> mkdir  adams
>>> cd adams
>>> git clone git at bitbucket.org:petsc/petsc.git
>>> cd petsc
>>> ./configure --download-mpich
>>>
>>> If you never set your ssh key to login then you need to do it at
>>> accounts.mcs.anl.gov (note that you cannot ssh via a password you need
>>> to set the ssh key).
>>>
>>> Barry
>>>
>>> > On Dec 15, 2016, at 5:32 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>> >
>>> > OK, that was useful. I don't have access to a Linux machine right now.
>>> I must have an uninitialized variable (that the compiler did not catch).
>>> > Thanks,
>>> >
>>> > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > [0]PETSC ERROR:
>>> > [0]PETSC ERROR: SNESSolve has not converged due to Nan or Inf norm
>>> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d
>>> ocumentation/faq.html for trouble shooting.
>>> > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-2475-g4fbd04f
>>> GIT Date: 2016-12-11 10:34:33 -0500
>>> > [0]PETSC ERROR: /global/u2/m/madams/landaufem/Plex/./landaufem on a
>>> arch-xc30-opt64-intel named nid00012 by madams Thu Dec 15 15:31:15 2016
>>> > [0]PETSC ERROR: Configure options COPTFLAGS="-fast -no-ipo -g"
>>> CXXOPTFLAGS="-fast -no-ipo -g" FOPTFLAGS="-fast -no-ipo -g"
>>> --download-hypre --download-parmetis --download-metis --download-p4est
>>> --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.16/INTEL/15.0 --with-ssl=0
>>> --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC
>>> --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn
>>> --with-fortranlib-autodetect=0 --with-shared-libraries=0 --with-x=0
>>> --with-mpiexec=srun LIBS=-lstdc++ --with-64-bit-indices
>>> PETSC_ARCH=arch-xc30-opt64-intel
>>> > [0]PETSC ERROR: #1 SNESSolve_NEWTONLS() line 186 in
>>> /global/u2/m/madams/petsc/src/snes/impls/ls/ls.c
>>> > TSSolve failed
>>> > 0 TS time steps, 528 cells, Nq=7 (3696 IPs), T=0.
>>> > [0]PETSC ERROR: #2 SNESSolve() line 4128 in
>>> /global/u2/m/madams/petsc/src/snes/interface/snes.c
>>> > [0]PETSC ERROR: #3 TS_SNESSolve() line 189 in
>>> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c
>>> >
>>> >
>>> > On Thu, Dec 15, 2016 at 6:24 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> >    It is useful to run with valgrind, even on a completely different
>>> machine, when you have errors because it will detect any memory corruption.
>>> So currently I do my valgrind runs on a linux machine.
>>> >
>>> >    Assuming the prefix is correct -snes_monitor should print an
>>> initial residual norm before doing any solves so it is curious you got no
>>> output. You can run with -snes_error_if_not_converged
>>> -ksp_error_if_not_converged to try to get it to output as soon as a problem
>>> is detected.
>>> >
>>> >    Barry
>>> >
>>> >
>>> > > On Dec 15, 2016, at 4:56 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>> > >
>>> > > I have a code that work on my Mac but it fails on both a Cray XC30
>>> and a KNL, unless the code is build with debug. I get this error message +
>>> -info output. I am using -snes_monitor but get no output. This code was
>>> working and I added a new feature. It does seem to fail when this new
>>> feature is used.
>>> > >
>>> > > And, alas, I do not have a functioning valgrind right now.
>>> > >
>>> > > I will start toggling optimization flags but any thoughts would be
>>> welcome.
>>> > >
>>> > > Mark
>>> > >
>>> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 4290 X 4290; storage
>>> space: 0 unneeded,132612 used
>>> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 50
>>> > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 4290) < 0.6. Do not use CompressedRow routines.
>>> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
>>> 1140850689 -2080374784
>>> > > [0] DMGetDMKSP(): Creating new DMKSP
>>> > > [0] TSAdaptCheckStage(): Step=0, nonlinear solve failures 1 greater
>>> than current TS allowed, stopping solve
>>> > > [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> > > [0]PETSC ERROR:
>>> > > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE,
>>> increase -ts_max_snes_failures or make negative to attempt recovery
>>> > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d
>>> ocumentation/faq.html for trouble shooting.
>>> > > [0]PETSC ERROR: Petsc Development GIT revision: unknown  GIT Date:
>>> unknown
>>> > > [0]PETSC ERROR: /global/u2/m/madams/landaufem/Plex/./landaufem on a
>>> arch-cori-knl-opt64-novector-intel named nid09355 by madams Thu Dec 15
>>> 14:25:00 2016
>>> > > [0]PETSC ERROR: TSSolve failed
>>> > > 1 TS time steps, 512 cells, Nq=9 (4608 IPs), T=0.
>>> > > Configure options COPTFLAGS="  -g -O1 -fp-model fast -qopt-report=5
>>> -hcpu=mic-knl -no-simd" CXXOPTFLAGS="-g -O1 -fp-model fast -qopt-report=5
>>> -hcpu=mic-knl -no-simd" FOPTFLAGS="  -g -O1 -fp-model fast -qopt-report=5
>>> -hcpu=mic-knl -no-simd" --download-metis=1 --download-parmetis=1
>>> --with-blas-lapack-dir=/usr/common/software/intel/compilers_
>>> and_libraries_2016.3.210/linux/mkl --with-cc=mpiicc --with-cxx=mpiicpc
>>> --with-debugging=0 --with-fc=mpiifort --with-mpiexec=srun --with-batch=0
>>> --with-memalign=64 --with-64-bit-indices PETSC_ARCH=arch-cori-knl-opt64-novector-intel
>>> --with-openmp=0 --download-p4est=0
>>> > > [0]PETSC ERROR: #1 TSStep() line 3972 in
>>> /global/u2/m/madams/petsc/src/ts/interface/ts.c
>>> > > [0]PETSC ERROR: #2 TSSolve() line 4218 in
>>> /global/u2/m/madams/petsc/src/ts/interface/ts.c
>>> > >
>>> >
>>> >
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20161215/4e3cb1f4/attachment.html>


More information about the petsc-dev mailing list