[petsc-users] HDF5 DM and VecView with MPI :: Crash
Thibault Bridel-Bertomeu
thibault.bridelbertomeu at gmail.com
Tue Jul 6 01:51:25 CDT 2021
Hello Barry,
Thank you for your answer. And sorry I forgot those important details ...
Here is the complete error message for a DMView :
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: --------------------- Stack Frames
------------------------------------
[1]PETSC ERROR: The EXACT line numbers in the error traceback are not
available.
[1]PETSC ERROR: instead the line number of the start of the function is
given.
[1]PETSC ERROR: #1 H5Dcreate2() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:690
[1]PETSC ERROR: #2 VecView_MPI_HDF5() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:594
[1]PETSC ERROR: #3 VecView_MPI() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:787
[1]PETSC ERROR: #4 VecView() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/interface/vector.c:576
[1]PETSC ERROR: #5 DMPlexCoordinatesView_HDF5_Internal() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:560
[1]PETSC ERROR: #6 DMPlexView_HDF5_Internal() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:802
[1]PETSC ERROR: #7 DMView_Plex() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plex.c:1366
[1]PETSC ERROR: #8 DMView() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/interface/dm.c:954
[1]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[1]PETSC ERROR: Signal received
[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.15.1-558-g07f732cb94 GIT
Date: 2021-07-04 15:58:55 +0000
[1]PETSC ERROR: /ccc/work/cont001/ocre/bridelbert/EULERIAN2D/bin/eulerian2D
on a named r1login by bridelbert Mon Jul 5 18:45:41 2021
[1]PETSC ERROR: Configure options --with-clean=1
--prefix=/ccc/work/cont001/ocre/bridelbert/05-PETSC/build_uns3D_inti
--with-make-np=8 --with-windows-graphics=0 --with-debugging=1
--download-mpich-shared=0 --with-x=0 --with-pthread=0 --with-valgrind=0
--PETSC_ARCH=INTI_UNS3D
--with-fc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpifort
--with-cc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicc
--with-cxx=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicxx
--with-openmp=0
--download-sowing=/ccc/work/cont001/ocre/bridelbert/v1.1.26-p2.tar.gz
--download-metis=/ccc/work/cont001/ocre/bridelbert/git.metis.tar.gz
--download-parmetis=/ccc/work/cont001/ocre/bridelbert/git.parmetis.tar.gz
--download-fblaslapack=/ccc/work/cont001/ocre/bridelbert/git.fblaslapack.tar.gz
--with-cmake-dir=/ccc/products/cmake-3.13.3/system/default
--download-hdf5=/ccc/work/cont001/ocre/bridelbert/hdf5-1.12.0.tar.bz2
--download-netcdf=/ccc/work/cont001/ocre/bridelbert/netcdf-4.5.0.tar.gz
--download-pnetcdf=/ccc/work/cont001/ocre/bridelbert/pnetcdf-1.12.1.tar.gz
--download-exodusii=/ccc/work/cont001/ocre/bridelbert/v2021-01-20.tar.gz
--download-zlib=/ccc/work/cont001/ocre/bridelbert/zlib-1.2.11.tar.gz
[1]PETSC ERROR: #1 User provided function() at unknown file:0
[1]PETSC ERROR: Checking the memory for corruption.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 59.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Here is the complete message for a VecView :
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: --------------------- Stack Frames
------------------------------------
[1]PETSC ERROR: The EXACT line numbers in the error traceback are not
available.
[1]PETSC ERROR: instead the line number of the start of the function is
given.
[1]PETSC ERROR: #1 H5Dcreate2() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:690
[1]PETSC ERROR: #2 VecView_MPI_HDF5() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:594
[1]PETSC ERROR: #3 VecView_MPI() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:787
[1]PETSC ERROR: #4 VecView_Plex_Local_HDF5_Internal() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:132
[1]PETSC ERROR: #5 VecView_Plex_HDF5_Internal() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:247
[1]PETSC ERROR: #6 VecView_Plex() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plex.c:391
[1]PETSC ERROR: #7 VecView() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/interface/vector.c:576
[1]PETSC ERROR: #8 ourmonitor() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/ts/interface/ftn-custom/ztsf.c:129
[1]PETSC ERROR: #9 TSMonitor() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/ts/interface/tsmon.c:31
[1]PETSC ERROR: #10 TSSolve() at
/ccc/work/cont001/ocre/bridelbert/05-PETSC/src/ts/interface/ts.c:3858
[1]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[1]PETSC ERROR: Signal received
[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.15.1-558-g07f732cb94 GIT
Date: 2021-07-04 15:58:55 +0000
[1]PETSC ERROR: /ccc/work/cont001/ocre/bridelbert/EULERIAN2D/bin/eulerian2D
on a named r1login by bridelbert Tue Jul 6 08:46:43 2021
[1]PETSC ERROR: Configure options --with-clean=1
--prefix=/ccc/work/cont001/ocre/bridelbert/05-PETSC/build_uns3D_inti
--with-make-np=8 --with-windows-graphics=0 --with-debugging=1
--download-mpich-shared=0 --with-x=0 --with-pthread=0 --with-valgrind=0
--PETSC_ARCH=INTI_UNS3D
--with-fc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpifort
--with-cc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicc
--with-cxx=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicxx
--with-openmp=0
--download-sowing=/ccc/work/cont001/ocre/bridelbert/v1.1.26-p2.tar.gz
--download-metis=/ccc/work/cont001/ocre/bridelbert/git.metis.tar.gz
--download-parmetis=/ccc/work/cont001/ocre/bridelbert/git.parmetis.tar.gz
--download-fblaslapack=/ccc/work/cont001/ocre/bridelbert/git.fblaslapack.tar.gz
--with-cmake-dir=/ccc/products/cmake-3.13.3/system/default
--download-hdf5=/ccc/work/cont001/ocre/bridelbert/hdf5-1.12.0.tar.bz2
--download-netcdf=/ccc/work/cont001/ocre/bridelbert/netcdf-4.5.0.tar.gz
--download-pnetcdf=/ccc/work/cont001/ocre/bridelbert/pnetcdf-1.12.1.tar.gz
--download-exodusii=/ccc/work/cont001/ocre/bridelbert/v2021-01-20.tar.gz
--download-zlib=/ccc/work/cont001/ocre/bridelbert/zlib-1.2.11.tar.gz
[1]PETSC ERROR: #1 User provided function() at unknown file:0
[1]PETSC ERROR: Checking the memory for corruption.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 59.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
I am currently on the "main" branch, my HEAD being at commit id.
07f732cb949ae259de817d126d140b8fa08e2d25
I have the same issue with the "master" branch actually, that's why I went
with the "main", hoping something might have been fixed meanwhile.
I cannot provide you with a MWE yet unfortunately because it's part of a
bigger solver and I have to extract the workflow from it. I'll work on it
so you have everything you need.
Thanks !!
Thibault
Le mar. 6 juil. 2021 à 01:08, Barry Smith <bsmith at petsc.dev> a écrit :
>
> Please send the error message that is printed to the screen.
>
> Also please send the exact PETSc version you are using. If possible
> also a code that reproduces the problem.
>
> Can you view other simpler things with HDF5? Like say just a vector?
>
> Barry
>
>
>
> On Jul 5, 2021, at 11:50 AM, Thibault Bridel-Bertomeu <
> thibault.bridelbertomeu at gmail.com> wrote:
>
> Dear all,
>
> I keep having this error on one of the supercomputers I have access to :
>
> [1]PETSC ERROR: The EXACT line numbers in the error traceback are not
> available.
> [1]PETSC ERROR: instead the line number of the start of the function is
> given.
> [1]PETSC ERROR: #1 H5Dcreate2() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:690
> [1]PETSC ERROR: #2 VecView_MPI_HDF5() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:594
> [1]PETSC ERROR: #3 VecView_MPI() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/impls/mpi/pdvec.c:787
> [1]PETSC ERROR: #4 VecView() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/vec/vec/interface/vector.c:576
> [1]PETSC ERROR: #5 DMPlexCoordinatesView_HDF5_Internal() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:560
> [1]PETSC ERROR: #6 DMPlexView_HDF5_Internal() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plexhdf5.c:802
> [1]PETSC ERROR: #7 DMView_Plex() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/impls/plex/plex.c:1366
> [1]PETSC ERROR: #8 DMView() at
> /ccc/work/cont001/ocre/bridelbert/05-PETSC/src/dm/interface/dm.c:954
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
>
> The configure options are as follow :
>
> [1]PETSC ERROR: Configure options --with-clean=1
> --prefix=/ccc/work/cont001/ocre/bridelbert/05-PETSC/build_uns3D_inti
> --with-make-np=8 --with-windows-graphics=0 --with-debugging=1
> --download-mpich-shared=0 --with-x=0 --with-pthread=0 --with-valgrind=0
> --PETSC_ARCH=INTI_UNS3D
> --with-fc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpifort
> --with-cc=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicc
> --with-cxx=/ccc/products/openmpi-4.0.3/gcc--8.3.0/default/bin/mpicxx
> --with-openmp=0
> --download-sowing=/ccc/work/cont001/ocre/bridelbert/v1.1.26-p2.tar.gz
> --download-metis=/ccc/work/cont001/ocre/bridelbert/git.metis.tar.gz
> --download-parmetis=/ccc/work/cont001/ocre/bridelbert/git.parmetis.tar.gz
> --download-fblaslapack=/ccc/work/cont001/ocre/bridelbert/git.fblaslapack.tar.gz
> --with-cmake-dir=/ccc/products/cmake-3.13.3/system/default
> --download-hdf5=/ccc/work/cont001/ocre/bridelbert/hdf5-1.12.0.tar.bz2
> --download-netcdf=/ccc/work/cont001/ocre/bridelbert/netcdf-4.5.0.tar.gz
> --download-pnetcdf=/ccc/work/cont001/ocre/bridelbert/pnetcdf-1.12.1.tar.gz
> --download-exodusii=/ccc/work/cont001/ocre/bridelbert/v2021-01-20.tar.gz
> --download-zlib=/ccc/work/cont001/ocre/bridelbert/zlib-1.2.11.tar.gz
>
> The piece of code that is responsible is that one :
>
> call PetscViewerHDF5Open(PETSC_COMM_WORLD,
> "debug_initmesh.h5", FILE_MODE_WRITE, hdf5Viewer, ierr); CHKERRA(ierr)
> call PetscViewerPushFormat(hdf5Viewer,
> PETSC_VIEWER_HDF5_XDMF, ierr); CHKERRA(ierr)
> call DMView(dm, hdf5Viewer, ierr); CHKERRA(ierr)
> call PetscViewerPopFormat(hdf5Viewer, ierr); CHKERRA(ierr)
> call PetscViewerDestroy(hdf5Viewer, ierr); CHKERRA(ierr)
>
> I tried with gcc, intel compiler, openmpi 2.x.x or openmpi 4.x.x ... same
> problems ... can anyone please advise ? It's starting to make me quite
> crazy ... x(
>
> Thank you !!!
>
> Thibault
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210706/b4f62079/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sod.msh
Type: application/octet-stream
Size: 2007782 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210706/b4f62079/attachment-0001.obj>
More information about the petsc-users
mailing list