[petsc-users] HDF5 DM and VecView with MPI :: Crash

Thibault Bridel-Bertomeu thibault.bridelbertomeu at gmail.com
Tue Jul 6 01:51:25 CDT 2021


Hello Barry,

Thank you for your answer. And sorry I forgot those important details ...

Here is the complete error message for a DMView :

[1]PETSC ERROR:
------------------------------------------------------------------------

[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range

[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger

[1]PETSC ERROR: or see
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors

[1]PETSC ERROR: likely location of problem given in stack below

[1]PETSC ERROR: ---------------------  Stack Frames
------------------------------------

[1]PETSC ERROR: The EXACT line numbers in the error traceback are not
available.

[1]PETSC ERROR: instead the line number of the start of the function is
given.

[1]PETSC ERROR: #1 H5Dcreate2() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:690

[1]PETSC ERROR: #2 VecView_MPI_HDF5() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:594

[1]PETSC ERROR: #3 VecView_MPI() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:787

[1]PETSC ERROR: #4 VecView() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/interface/vector.c:576

[1]PETSC ERROR: #5 DMPlexCoordinatesView_HDF5_Internal() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:560

[1]PETSC ERROR: #6 DMPlexView_HDF5_Internal() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:802

[1]PETSC ERROR: #7 DMView_Plex() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plex.c:1366

[1]PETSC ERROR: #8 DMView() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/interface/dm.c:954

[1]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------

[1]PETSC ERROR: Signal received

[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.

[1]PETSC ERROR: Petsc Development GIT revision: v3.15.1-558-g07f732cb94  GIT
Date: 2021-07-04 15:58:55 +0000

[1]PETSC ERROR: /ccc/work/cont001/ocre/bridelbert/EULERIAN2D/bin/eulerian2D
on a  named r1login by bridelbert Mon Jul  5 18:45:41 2021

[1]PETSC ERROR: Configure options --with-clean=1
--prefix=/ccc/work/cont001/ocre/bridelbert/05-PETSC/build_uns3D_inti
--with-make-np=8 --with-windows-graphics=0 --with-debugging=1
--download-mpich-shared=0 --with-x=0 --with-pthread=0 --with-valgrind=0
--PETSC_ARCH=INTI_UNS3D
--with-fc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpifort
--with-cc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicc
--with-cxx=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicxx
--with-openmp=0
--download-sowing=/ccc/work/cont001/ocre/bridelbert/v1.1.26-p2.tar.gz
--download-metis=/ccc/work/cont001/ocre/bridelbert/git.metis.tar.gz
--download-parmetis=/ccc/work/cont001/ocre/bridelbert/git.parmetis.tar.gz
--download-fblaslapack=/ccc/work/cont001/ocre/bridelbert/git.fblaslapack.tar.gz
--with-cmake-dir=/ccc/products/cmake-3.13.3/system/default
--download-hdf5=/ccc/work/cont001/ocre/bridelbert/hdf5-1.12.0.tar.bz2
--download-netcdf=/ccc/work/cont001/ocre/bridelbert/netcdf-4.5.0.tar.gz
--download-pnetcdf=/ccc/work/cont001/ocre/bridelbert/pnetcdf-1.12.1.tar.gz
--download-exodusii=/ccc/work/cont001/ocre/bridelbert/v2021-01-20.tar.gz
--download-zlib=/ccc/work/cont001/ocre/bridelbert/zlib-1.2.11.tar.gz

[1]PETSC ERROR: #1 User provided function() at unknown file:0

[1]PETSC ERROR: Checking the memory for corruption.

--------------------------------------------------------------------------

MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD

with errorcode 59.


NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--------------------------------------------------------------------------

Here is the complete message for a VecView :

[1]PETSC ERROR:
------------------------------------------------------------------------

[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range

[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger

[1]PETSC ERROR: or see
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors

[1]PETSC ERROR: likely location of problem given in stack below

[1]PETSC ERROR: ---------------------  Stack Frames
------------------------------------

[1]PETSC ERROR: The EXACT line numbers in the error traceback are not
available.

[1]PETSC ERROR: instead the line number of the start of the function is
given.

[1]PETSC ERROR: #1 H5Dcreate2() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:690

[1]PETSC ERROR: #2 VecView_MPI_HDF5() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:594

[1]PETSC ERROR: #3 VecView_MPI() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:787

[1]PETSC ERROR: #4 VecView_Plex_Local_HDF5_Internal() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:132

[1]PETSC ERROR: #5 VecView_Plex_HDF5_Internal() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:247

[1]PETSC ERROR: #6 VecView_Plex() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plex.c:391

[1]PETSC ERROR: #7 VecView() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/interface/vector.c:576

[1]PETSC ERROR: #8 ourmonitor() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/ts/interface/ftn-custom/ztsf.c:129

[1]PETSC ERROR: #9 TSMonitor() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/ts/interface/tsmon.c:31

[1]PETSC ERROR: #10 TSSolve() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/ts/interface/ts.c:3858

[1]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------

[1]PETSC ERROR: Signal received

[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.

[1]PETSC ERROR: Petsc Development GIT revision: v3.15.1-558-g07f732cb94  GIT
Date: 2021-07-04 15:58:55 +0000

[1]PETSC ERROR: /ccc/work/cont001/ocre/bridelbert/EULERIAN2D/bin/eulerian2D
on a  named r1login by bridelbert Tue Jul  6 08:46:43 2021

[1]PETSC ERROR: Configure options --with-clean=1
--prefix=/ccc/work/cont001/ocre/bridelbert/05-PETSC/build_uns3D_inti
--with-make-np=8 --with-windows-graphics=0 --with-debugging=1
--download-mpich-shared=0 --with-x=0 --with-pthread=0 --with-valgrind=0
--PETSC_ARCH=INTI_UNS3D
--with-fc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpifort
--with-cc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicc
--with-cxx=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicxx
--with-openmp=0
--download-sowing=/ccc/work/cont001/ocre/bridelbert/v1.1.26-p2.tar.gz
--download-metis=/ccc/work/cont001/ocre/bridelbert/git.metis.tar.gz
--download-parmetis=/ccc/work/cont001/ocre/bridelbert/git.parmetis.tar.gz
--download-fblaslapack=/ccc/work/cont001/ocre/bridelbert/git.fblaslapack.tar.gz
--with-cmake-dir=/ccc/products/cmake-3.13.3/system/default
--download-hdf5=/ccc/work/cont001/ocre/bridelbert/hdf5-1.12.0.tar.bz2
--download-netcdf=/ccc/work/cont001/ocre/bridelbert/netcdf-4.5.0.tar.gz
--download-pnetcdf=/ccc/work/cont001/ocre/bridelbert/pnetcdf-1.12.1.tar.gz
--download-exodusii=/ccc/work/cont001/ocre/bridelbert/v2021-01-20.tar.gz
--download-zlib=/ccc/work/cont001/ocre/bridelbert/zlib-1.2.11.tar.gz

[1]PETSC ERROR: #1 User provided function() at unknown file:0

[1]PETSC ERROR: Checking the memory for corruption.

--------------------------------------------------------------------------

MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD

with errorcode 59.


NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--------------------------------------------------------------------------

I am currently on the "main" branch, my HEAD being at commit id.
07f732cb949ae259de817d126d140b8fa08e2d25
I have the same issue with the "master" branch actually, that's why I went
with the "main", hoping something might have been fixed meanwhile.

I cannot provide you with a MWE yet unfortunately because it's part of a
bigger solver and I have to extract the workflow from it. I'll work on it
so you have everything you need.

Thanks !!

Thibault


Le mar. 6 juil. 2021 à 01:08, Barry Smith <bsmith at petsc.dev> a écrit :

>
>    Please send the error message that is printed to the screen.
>
>     Also please send the exact PETSc version you are using.   If possible
> also a code that reproduces the problem.
>
>    Can you view other simpler things with HDF5? Like say just a vector?
>
>    Barry
>
>
>
> On Jul 5, 2021, at 11:50 AM, Thibault Bridel-Bertomeu <
> thibault.bridelbertomeu at gmail.com> wrote:
>
> Dear all,
>
> I keep having this error on one of the supercomputers I have access to :
>
> [1]PETSC ERROR: The EXACT line numbers in the error traceback are not
> available.
> [1]PETSC ERROR: instead the line number of the start of the function is
> given.
> [1]PETSC ERROR: #1 H5Dcreate2() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:690
> [1]PETSC ERROR: #2 VecView_MPI_HDF5() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:594
> [1]PETSC ERROR: #3 VecView_MPI() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:787
> [1]PETSC ERROR: #4 VecView() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/interface/vector.c:576
> [1]PETSC ERROR: #5 DMPlexCoordinatesView_HDF5_Internal() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:560
> [1]PETSC ERROR: #6 DMPlexView_HDF5_Internal() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:802
> [1]PETSC ERROR: #7 DMView_Plex() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plex.c:1366
> [1]PETSC ERROR: #8 DMView() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/interface/dm.c:954
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> The configure options are as follow :
>
> [1]PETSC ERROR: Configure options --with-clean=1
> --prefix=/ccc/work/cont001/ocre/bridelbert/05-PETSC/build_uns3D_inti
> --with-make-np=8 --with-windows-graphics=0 --with-debugging=1
> --download-mpich-shared=0 --with-x=0 --with-pthread=0 --with-valgrind=0
> --PETSC_ARCH=INTI_UNS3D
> --with-fc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpifort
> --with-cc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicc
> --with-cxx=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicxx
> --with-openmp=0
> --download-sowing=/ccc/work/cont001/ocre/bridelbert/v1.1.26-p2.tar.gz
> --download-metis=/ccc/work/cont001/ocre/bridelbert/git.metis.tar.gz
> --download-parmetis=/ccc/work/cont001/ocre/bridelbert/git.parmetis.tar.gz
> --download-fblaslapack=/ccc/work/cont001/ocre/bridelbert/git.fblaslapack.tar.gz
> --with-cmake-dir=/ccc/products/cmake-3.13.3/system/default
> --download-hdf5=/ccc/work/cont001/ocre/bridelbert/hdf5-1.12.0.tar.bz2
> --download-netcdf=/ccc/work/cont001/ocre/bridelbert/netcdf-4.5.0.tar.gz
> --download-pnetcdf=/ccc/work/cont001/ocre/bridelbert/pnetcdf-1.12.1.tar.gz
> --download-exodusii=/ccc/work/cont001/ocre/bridelbert/v2021-01-20.tar.gz
> --download-zlib=/ccc/work/cont001/ocre/bridelbert/zlib-1.2.11.tar.gz
>
> The piece of code that is responsible is that one :
>
>                 call PetscViewerHDF5Open(PETSC_COMM_WORLD,
> "debug_initmesh.h5", FILE_MODE_WRITE, hdf5Viewer, ierr); CHKERRA(ierr)
>                 call PetscViewerPushFormat(hdf5Viewer,
> PETSC_VIEWER_HDF5_XDMF, ierr); CHKERRA(ierr)
>                 call DMView(dm, hdf5Viewer, ierr); CHKERRA(ierr)
>                 call PetscViewerPopFormat(hdf5Viewer, ierr); CHKERRA(ierr)
>                 call PetscViewerDestroy(hdf5Viewer, ierr); CHKERRA(ierr)
>
> I tried with gcc, intel compiler, openmpi 2.x.x or openmpi 4.x.x ... same
> problems ... can anyone please advise ? It's starting to make me quite
> crazy ... x(
>
> Thank you !!!
>
> Thibault
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210706/b4f62079/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sod.msh
Type: application/octet-stream
Size: 2007782 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210706/b4f62079/attachment-0001.obj>


More information about the petsc-users mailing list