[Darshan-users] HDF5 chunking and Darshan with enable-hdf5-mod results in error
Latham, Robert J.
robl at mcs.anl.gov
Tue Apr 20 12:11:05 CDT 2021
On Mon, 2021-04-19 at 14:17 +0200, Tobias Meisel wrote:
> Hi all,
> I’ve built 3.2.1 Darshan with --enable-hdf5-mod=
> When I began writing chunks collectively with hdf5 I got errors only
> when darshan instrumentation is enabled.
> The darshan instrumentation without hdf5 chunking worked fine (also
> with --enable-hdf5-mod).
The errors are two kinds:
- "H5Shyper.c line 12116 in H5Sget_regular_hyperslab(): not a hyperslab
-- Gerd Heber says some processes are calling H5Sselect_none which is
not a "regular hyperslab selection". We've seen this warning before in
other workloads this week.
But there's a more alarming error -- not just a warning but a divide by
[archlinux:31445] *** Process received signal ***
[archlinux:31445] Signal: Floating point exception (8)
[archlinux:31445] Signal code: Integer divide-by-zero (1)
[archlinux:31445] Failing at address: 0x7f3f980614bf
[archlinux:31445] [ 0]
[archlinux:31445] [ 1]
[archlinux:31445] [ 2]
[archlinux:31445] [ 3]
Darshan doesn't divide anything by anything in its wrappers, but
Lots of questions here:
- OpenMPI-IO shouldn't ever crash based on user input, but it does.
- How is Darshan feeding OpenMPI-IO such a bogus payload?
- How is the hyperslab selection, or lack thereof, triggering all this?
> I have opened a topic at the HDF group forum:
> The resulting errors are posted to this topic as well.
> The minimal example to reproduce the error is also here:
> My Setup is:
> OpenMPI (OpenRTE) 4.0.5
> HDF5 1.12.0
> Darshan 3.2.1
> Could you take a look and check if there is a problem with the HDF5
> instrumentation in Darshan?
> Thank you
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
More information about the Darshan-users