[petsc-users] Parallel writing in HDF5-1.12.0 when some processors have no data to write

Danyang Su danyang.su at gmail.com
Sun Jun 14 01:31:30 CDT 2020


Hi Jed,

I cannot reproduce the same problem in C (attached example). The C code works fine on HDF 1.12.0 without problem. I am still confused why this happens.  
Anyway, will use HDF5-1.10.6 with PETSc-3.13 at this moment.

Thanks,

Danyang

On 2020-06-13, 1:39 PM, "Jed Brown" <jed at jedbrown.org> wrote:

    Can you reproduce in C?  You're missing three extra arguments that exist in the Fortran interface.

    https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Create

    Danyang Su <danyang.su at gmail.com> writes:

    > Hi Jed,
    >
    > Attached is the example for your test.  
    >
    > This example uses H5Sset_none to tell H5Dwrite call that there will be no data. 4-th process HAS to participate we are in a collective mode.
    > The code is ported and modified based on the C example from https://support.hdfgroup.org/ftp/HDF5/examples/parallel/coll_test.c
    >
    > The compiling flags in the makefile are same as those used in my own code.
    >
    > To compile the code, please run 'make all'
    > To test the code, please run 'mpiexec -n 4 ./hdf5_zero_data'. Any number of processors larger than 4 should help to detect the problem.
    >
    > The code may crash on HDF5 1.12.0 but works fine on HDF5 1.10.x.
    >
    > The following platforms have been tested:
    >   Macos-Mojave + GNU-8.2 + HDF5-1.12.0 -> Works fine
    >   Ubuntu-16.04 + GNU-5.4 + HDF5-1.12.0 -> Crashes
    >   Ubuntu-16.04 + GNU-7.5 + HDF5-1.12.0 -> Crashes
    >   Ubuntu-16.04 + GNU-5.4 + HDF5-1.10.x -> Works fine
    >   Centos-7 + Intel2018 + HDF5-12.0  -> Works fine
    >
    > Possible error when code crashes
    >      At line 6686 of file H5_gen.F90
    >      Fortran runtime error: Index '1' of dimension 1 of array 'buf' above upper bound of 0
    >
    > Thanks,
    >
    > Danyang
    >
    > On 2020-06-12, 6:05 AM, "Jed Brown" <jed at jedbrown.org> wrote:
    >
    >     Danyang Su <danyang.su at gmail.com> writes:
    >
    >     > Hi Jed,
    >     >
    >     > Thanks for your double check. 
    >     >
    >     > The HDF 1.10.6 version also works. But versions from 1.12.x stop working.
    >
    >     I'd suggest making a reduced test case in order to submit a bug report.
    >
    >     This was the relevant change in PETSc for hdf5-1.12.
    >
    >     https://gitlab.com/petsc/petsc/commit/806daeb7de397195b5132278177f4d5553f9f612
    >
    >     > Attached is the code section where I have problem.
    >     >
    >     >     !c write the dataset collectively
    >     >     !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    >     >     !!!! CODE CRASHES HERE IF SOME PROCESSORS HAVE NO DATA TO WRITE!!!!
    >     >     !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    >     >     call h5dwrite_f(dset_id, H5T_NATIVE_DOUBLE, dataset, hdf5_dsize,   &
    >     >                     hdf5_ierr, file_space_id=filespace,                &
    >     >                     mem_space_id=memspace, xfer_prp = xlist_id)
    >     >
    >     > Please let me know if there is something wrong in the code that causes the problem.
    >
    > !c
    > !c This example uses H5Sset_none to tell H5Dwrite call that 
    > !c there will be no data. 4-th process HAS to participate since 
    > !c we are in a collective mode.
    > !c 
    > !c The code is ported and modified based on the C example 
    > !c from https://support.hdfgroup.org/ftp/HDF5/examples/parallel/coll_test.c
    > !c by Danyang Su on June 12, 2020.
    > !c
    > !c To compile the code, please run 'make all'
    > !c To test the code, please run 'mpiexec -n 4 ./hdf5_zero_data'
    > !c
    > !c IMPORTNANT NOTE
    > !c The code may crash on HDF5 1.12.0 but works fine on HDF5 1.10.x.
    > !c 
    > !c The following platforms have been tested:
    > !c Macos-Mojave + GNU-8.2 + HDF5-1.12.0 -> Works fine
    > !c Ubuntu-16.04 + GNU-5.4 + HDF5-1.12.0 -> Crashes
    > !c Ubuntu-16.04 + GNU-7.5 + HDF5-1.12.0 -> Crashes
    > !c Ubuntu-16.04 + GNU-5.4 + HDF5-1.10.x -> Works fine
    > !c Centos-7 + Intel2018 + HDF5-12.0     -> Works fine
    > !c
    > !c Possible error when code crashes
    > !c     At line 6686 of file H5_gen.F90
    > !c     Fortran runtime error: Index '1' of dimension 1 of array 'buf' above upper bound of 0
    > !c 
    >
    > program hdf5_zero_data
    >
    >
    > #include <petsc/finclude/petscsys.h>
    >
    >       use petscsys
    >       use hdf5
    >
    >       implicit none     
    >
    >       character(len=10), parameter :: h5File_Name = "SDS_row.h5"
    >       character(len=8), parameter :: DatasetName = "IntArray"
    >       integer, parameter :: nx = 8, ny = 5, ndim = 2
    >
    >       integer :: i
    >       integer :: hdf5_ierr                   ! HDF5 error code
    >       PetscErrorCode :: ierr
    >
    >       integer(HID_T) :: file_id              ! File identifier
    >       integer(HID_T) :: dset_id              ! Dataset identifier
    >       integer(HID_T) :: space_id             ! Space identifier
    >       integer(HID_T) :: plist_id             ! Property list identifier
    >       integer(HID_T) :: FileSpace            ! File identifier
    >       integer(HID_T) :: MemSpace             ! Dataset identifier
    >
    >       integer(HSIZE_T) :: dimsf(2)           ! Dataset dimensions
    >       integer(HSIZE_T) :: hcount(2)          ! hyperslab selection parameters
    >       integer(HSIZE_T) :: offset(2)          ! hyperslab selection parameters
    >
    >       integer, allocatable :: data(:)        ! Dataset to write
    >
    >       integer :: mpi_size, mpi_rank;
    >
    >       !c Initialize MPI.
    >       call PetscInitialize(Petsc_Null_Character,ierr)  
    >       CHKERRQ(ierr)
    >       call MPI_Comm_rank(Petsc_Comm_World,mpi_rank,ierr)
    >       CHKERRQ(ierr)
    >       call MPI_Comm_size(Petsc_Comm_World,mpi_size,ierr)
    >       CHKERRQ(ierr)
    >
    >       !c initialize fortran interface
    >       call h5open_f(hdf5_ierr)
    >
    >       !c Set up file access property list with parallel I/O access.
    >       call h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, hdf5_ierr)
    >       call h5pset_fapl_mpio_f(plist_id, Petsc_Comm_World,              &
    >                               MPI_INFO_NULL, hdf5_ierr) 
    >
    >       !c Create a new file collectively and release property list identifier.
    >       call h5fcreate_f(h5File_Name, H5F_ACC_TRUNC_F, file_id,          &
    >                        hdf5_ierr, access_prp = plist_id)
    >       call h5pclose_f(plist_id, hdf5_ierr)
    >
    >       !c Create the dataspace for the dataset.
    >       dimsf(1) = nx
    >       dimsf(2) = ny
    >       call h5screate_simple_f(ndim, dimsf, FileSpace, hdf5_ierr) 
    >
    >       !c Create the dataset with default properties and close filespace.
    >       call h5dcreate_f(file_id, DatasetName, H5T_NATIVE_INTEGER,       &
    >                        FileSpace, dset_id, hdf5_ierr)
    >       call h5sclose_f(FileSpace, hdf5_ierr)
    >
    >       !c Each process defines dataset in memory and writes it to the hyperslab
    >       !c in the file.
    >       if (mpi_rank == 3) then
    >         hcount(1) = 0
    >         hcount(2) = dimsf(2)
    >       else
    >         hcount(1) = dimsf(1)/mpi_size
    >         hcount(2) = dimsf(2)
    >       end if
    >
    >       offset(1) = mpi_rank*hcount(1)
    >       offset(2) = 0
    >       call h5screate_simple_f(ndim, hcount, MemSpace, hdf5_ierr)  
    >
    >       if (mpi_rank == 3) then
    >         call h5sselect_none_f(MemSpace,hdf5_ierr)
    >       end if
    >
    >       !c Select hyperslab in the file.
    >       call h5dget_space_f(dset_id,FileSpace,hdf5_ierr)
    >       call h5sselect_hyperslab_f(FileSpace, H5S_SELECT_SET_F,          &
    >                                  offset, hcount, hdf5_ierr)
    >
    >       if (mpi_rank == 3) then
    >         call h5sselect_none_f(FileSpace,hdf5_ierr)
    >       end if      
    >       
    >       !c Initialize data buffer.
    >       if (mpi_rank == 3) then
    >         allocate(data(0))
    >         write(*,*) 'Null data space for processor ',mpi_rank
    >       else
    >         allocate(data(hcount(1)*hcount(2)))
    >         write(*,*) 'Data space for processor ',mpi_rank, 'size',size(data)
    >         do i = 1, hcount(1)*hcount(2)
    >           data(i) = mpi_rank+10
    >         end do
    >       end if
    >
    >       !c Create property list for collective dataset write.
    >       call h5pcreate_f(H5P_DATASET_XFER_F, plist_id, hdf5_ierr)
    >       call h5pset_dxpl_mpio_f(plist_id, H5FD_MPIO_COLLECTIVE_F,        &
    >                               hdf5_ierr)
    >       !c write the dataset collectively
    >       call h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, hcount,       &
    >                       hdf5_ierr, file_space_id=FileSpace,              &
    >                       mem_space_id=MemSpace, xfer_prp = plist_id)
    >
    >       !c free data.
    >       deallocate(data)
    >
    >       !c Close/release resources.
    >       call h5dclose_f(dset_id, hdf5_ierr)
    >       call h5sclose_f(FileSpace, hdf5_ierr)
    >       call h5sclose_f(MemSpace, hdf5_ierr)
    >       call h5pclose_f(plist_id, hdf5_ierr)
    >       call h5fclose_f(file_id, hdf5_ierr)
    >
    >       call PetscFinalize(ierr)
    >       CHKERRQ(ierr)
    >
    >       return
    >
    > end program hdf5_zero_data

-------------- next part --------------
A non-text attachment was scrubbed...
Name: coll_test.c
Type: application/octet-stream
Size: 3349 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200613/a0a20b35/attachment.obj>


More information about the petsc-users mailing list