[petsc-users] Parallel writing in HDF5-1.12.0 when some processors have no data to write
Danyang Su
danyang.su at gmail.com
Fri Jun 12 14:29:43 CDT 2020
Hi Jed,
Attached is the example for your test.
This example uses H5Sset_none to tell H5Dwrite call that there will be no data. 4-th process HAS to participate we are in a collective mode.
The code is ported and modified based on the C example from https://support.hdfgroup.org/ftp/HDF5/examples/parallel/coll_test.c
The compiling flags in the makefile are same as those used in my own code.
To compile the code, please run 'make all'
To test the code, please run 'mpiexec -n 4 ./hdf5_zero_data'. Any number of processors larger than 4 should help to detect the problem.
The code may crash on HDF5 1.12.0 but works fine on HDF5 1.10.x.
The following platforms have been tested:
Macos-Mojave + GNU-8.2 + HDF5-1.12.0 -> Works fine
Ubuntu-16.04 + GNU-5.4 + HDF5-1.12.0 -> Crashes
Ubuntu-16.04 + GNU-7.5 + HDF5-1.12.0 -> Crashes
Ubuntu-16.04 + GNU-5.4 + HDF5-1.10.x -> Works fine
Centos-7 + Intel2018 + HDF5-12.0 -> Works fine
Possible error when code crashes
At line 6686 of file H5_gen.F90
Fortran runtime error: Index '1' of dimension 1 of array 'buf' above upper bound of 0
Thanks,
Danyang
On 2020-06-12, 6:05 AM, "Jed Brown" <jed at jedbrown.org> wrote:
Danyang Su <danyang.su at gmail.com> writes:
> Hi Jed,
>
> Thanks for your double check.
>
> The HDF 1.10.6 version also works. But versions from 1.12.x stop working.
I'd suggest making a reduced test case in order to submit a bug report.
This was the relevant change in PETSc for hdf5-1.12.
https://gitlab.com/petsc/petsc/commit/806daeb7de397195b5132278177f4d5553f9f612
> Attached is the code section where I have problem.
>
> !c write the dataset collectively
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!! CODE CRASHES HERE IF SOME PROCESSORS HAVE NO DATA TO WRITE!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> call h5dwrite_f(dset_id, H5T_NATIVE_DOUBLE, dataset, hdf5_dsize, &
> hdf5_ierr, file_space_id=filespace, &
> mem_space_id=memspace, xfer_prp = xlist_id)
>
> Please let me know if there is something wrong in the code that causes the problem.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hdf5_zero_data.F90
Type: application/octet-stream
Size: 5487 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200612/0d16d0cf/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile
Type: application/octet-stream
Size: 5065 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200612/0d16d0cf/attachment-0001.obj>
More information about the petsc-users
mailing list