[petsc-users] Tracking a NaN in FVM gradient computation

Jed Brown jed at jedbrown.org
Sun Jun 20 22:59:19 CDT 2021


Looks like this is the relevant code.

      for (d = 0; d < dim; ++d) {
        if (cgrad[0]) cgrad[0][pd*dim+d] += fg->grad[0][d] * delta;
        if (cgrad[1]) cgrad[1][pd*dim+d] -= fg->grad[1][d] * delta;
      }

I ran in a debugger and found there was already nan here:

Thread 1 "ex11" received signal SIGFPE, Arithmetic exception.
0x00007ffff6c88e31 in DMPlexReconstructGradients_Internal (dm=0x555555b21230, fvm=0x5555558cc4b0, fStart=25, fEnd=41, faceGeometry=0x555555e3b3a0, cellGeometry=0x555555e54560, locX=0x555555edb7f0, grad
=0x555555ee6ce0) at src/dm/impls/plex/plexfvm.c:111
111             if (cgrad[0]) cgrad[0][pd*dim+d] += fg->grad[0][d] * delta;
(gdb) p fg->grad[0][d]
$1 = nan(0x000000002)
(gdb) p d
$2 = 0


That indicates memory corruption because if the nan had been computed in an earlier step, we would have trapped there. Indeed, I see Valgrind errors. I'm adding Toby, who developed much of this and might be able to debug more efficiently. It would be useful to file an issue on GitLab.

$ valgrind  mpich-clang/tests/ts/tutorials/ex11 -ufv_use_amr -fp_trap
==599057== Memcheck, a memory error detector
==599057== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==599057== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==599057== Command: mpich-clang/tests/ts/tutorials/ex11 -ufv_use_amr -fp_trap
==599057==
==599057== Conditional jump or move depends on uninitialised value(s)
==599057==    at 0x122A31: adaptToleranceFVM (ex11.c:1608)
==599057==    by 0x11A89B: main (ex11.c:1927)
==599057==
==599057== Conditional jump or move depends on uninitialised value(s)
==599057==    at 0x122A77: adaptToleranceFVM (ex11.c:1609)
==599057==    by 0x11A89B: main (ex11.c:1927)
==599057==
==599057== Conditional jump or move depends on uninitialised value(s)
==599057==    at 0x4F3B293: VecTaggerComputeIS_FromBoxes (tagger.c:465)
==599057==    by 0x4F3AB34: VecTaggerComputeIS (tagger.c:422)
==599057==    by 0x122E91: adaptToleranceFVM (ex11.c:1619)
==599057==    by 0x11A89B: main (ex11.c:1927)
==599057==
==599057== Conditional jump or move depends on uninitialised value(s)
==599057==    at 0x4F3B293: VecTaggerComputeIS_FromBoxes (tagger.c:465)
==599057==    by 0x4F3AB34: VecTaggerComputeIS (tagger.c:422)
==599057==    by 0x122F23: adaptToleranceFVM (ex11.c:1620)
==599057==    by 0x11A89B: main (ex11.c:1927)
==599057==
==599057== Conditional jump or move depends on uninitialised value(s)
==599057==    at 0x4F3B2CB: VecTaggerComputeIS_FromBoxes (tagger.c:472)
==599057==    by 0x4F3AB34: VecTaggerComputeIS (tagger.c:422)
==599057==    by 0x122F23: adaptToleranceFVM (ex11.c:1620)
==599057==    by 0x11A89B: main (ex11.c:1927)

"Ellen M. Price" <ellen.price at cfa.harvard.edu> writes:

> Hi there PETSc,
>
> I am working my way through ex11.c and have encountered a problem. On
> the first pass through mesh adaption, the gradient computation in PETSc
> triggers a NaN, even though none of the input data are NaN.
>
> To reproduce:
>
> PETSc v3.15 with latest MPICH, no external libraries, debugging on
> Compile ex11.c using GCC 9.3 on Ubuntu 20.04
> Run as: ./ex11 -ufv_use_amr -fp_trap
>
> Running under GDB shows that the offending line is:
> src/dm/impls/plex/plexfvm.c:111
> This originates from the call to DMPlexReconstructGradientsFVM.
>
> I only noticed this because I was trying to resolve a discontinuity in
> my own initial condition for a Sod shock tube problem, but then I found
> that it occurs in the example without any modification, too.
>
> Is this somehow intended? If not, what steps can one take to make sure
> the gradients are actual numeric values? It doesn't make sense to me
> that they would be undefined on the first step when nothing else is NaN.
> Or is it that AMR in ex11 doesn't quite work yet?
>
> Thanks,
> Ellen Price


More information about the petsc-users mailing list