[petsc-users] strange segv

Mark Adams mfadams at lbl.gov
Sat May 29 19:46:42 CDT 2021


On Sat, May 29, 2021 at 7:48 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>    I don't see why it is not running the Kokkos check. Here is the rule
> right below the CUDA rule that is apparently running.
>
> check_build:
>         - at echo "Running check examples to verify correct installation"
>         - at echo "Using PETSC_DIR=${PETSC_DIR} and PETSC_ARCH=${PETSC_ARCH}"
>         + at cd src/snes/tutorials >/dev/null; ${OMAKE_SELF}
> PETSC_ARCH=${PETSC_ARCH}  PETSC_DIR=${PETSC_DIR} clean-legacy
>         + at cd src/snes/tutorials >/dev/null; ${OMAKE_SELF}
> PETSC_ARCH=${PETSC_ARCH}  PETSC_DIR=${PETSC_DIR} testex19
>         + at if [ "${HYPRE_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" = "" ]
> &&  [ "${PETSC_SCALAR}" = "real" ]; then \
>           cd src/snes/tutorials >/dev/null; ${OMAKE_SELF}
> PETSC_ARCH=${PETSC_ARCH}  PETSC_DIR=${PETSC_DIR}
> DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex19_hypre; \
>          fi;
>         + at if [ "${CUDA_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" = "" ] &&
> [ "${PETSC_SCALAR}" = "real" ]; then \
>           cd src/snes/tutorials >/dev/null; ${OMAKE_SELF}
> PETSC_ARCH=${PETSC_ARCH}  PETSC_DIR=${PETSC_DIR}
> DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex19_cuda; \
>          fi;
>         + at if [ "${KOKKOS_KERNELS_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}"
> = "" ] &&  [ "${PETSC_SCALAR}" = "real" ] && [ "${PETSC_PRECISION}" =
> "double" ] && [ "${MPI_IS_MPIUNI}" = "0" ]; then \
>           cd src/snes/tutorials >/dev/null; ${OMAKE_SELF}
> PETSC_ARCH=${PETSC_ARCH}  PETSC_DIR=${PETSC_DIR}
> DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex3k_kokkos; \
>          fi;
>
>   Regarding the debugging, if it is just one MPI rank (or even more) with
> GDB it will trap the error and show the exact line of source code where the
> error occurred and you can poke around at variables to see if they look
> corrupt or wrong (for example crazy address in a pointer), I don't know why
> your debugger is not giving more useful information.
>
>
This is what I did (in DDT). It stopped at the function call and the data
looked fine. I stepped into the call, but didn't get to it. The signal
handler was called and I was dead.
Maybe I did something in my branch. Can't see what, but I keep probing,
Thanks,


>   Barry
>
>
> > On May 29, 2021, at 2:16 PM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > I am running on Summit with Kokkos-CUDA and I am getting a segv that
> looks like some sort of a compile/link mismatch. I also have a user with a
> C++ code that is getting strange segvs when calling MatSetValues with CUDA
> (I know MatSetValues is not a cupsarse method, but that is the report that
> I have). I have no idea if these are related but they both involve C -- C++
> calls ...
> >
> > I started with a clean build (attached) and I ran in DDT. DDT stopped at
> the call in plexland.c to the KokkosLanau operator. I stepped into this
> function and then took this screenshot of the stack, with the Kokkos call
> and PETSc signal handler.
> >
> > Make check does not seem to be running Kokkos tests:
> >
> > 15:02 adams/landau-mass-opt *= /gpfs/alpine/csc314/scratch/adams/petsc$
> make PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
> PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 check
> > Running check examples to verify correct installation
> > Using PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc and
> PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10
> > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> > C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI
> processes
> > C/C++ example src/snes/tutorials/ex19 run successfully with cuda
> > Completed test examples
> >
> > Also, I ran this AM with another branch that had not been rebased with
> main as recently as this branch (adams/landau-mass-opt).
> >
> > Any ideas?
> > <make.log><configure.log><Screen Shot 2021-05-29 at 2.51.00 PM.png>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210529/f92ad7de/attachment-0001.html>


More information about the petsc-users mailing list