<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks again for reporting these issues, Tobias. Just wanted to close the loop on the mailing list:</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<ul>
<li>The HDF5 warnings related to hyperslab selections were due to a bug in Darshan that is now fixed.<br>
</li><li>The crashes appear due to an OpenMPI bug that Darshan happens to trigger for some workloads. We have similarly modified Darshan to avoid triggering this bug when using OpenMPI, so should be safe while OpenMPI folks continue to investigate the underlying
issue (see: <a href="https://github.com/open-mpi/ompi/issues/8841" id="LPlnk">https://github.com/open-mpi/ompi/issues/8841</a>)<br>
</li><div class="_Entity _EType_OWALinkPreview _EId_OWALinkPreview _EReadonly_1"></div>
<br>
</ul>
<div>Both of these bug fixes are available starting in the darshan-3.3.0-pre2 pre-release that just came out today, and they will obviously be included in the stable 3.3.0 release that we plan to have available next week.<br>
</div>
<div><br>
</div>
<div>Please let us know if you have any further issues related to this and we'd be happy to investigate more.<br>
</div>
<div><br>
</div>
<div>--Shane<br>
</div>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Darshan-users <darshan-users-bounces@lists.mcs.anl.gov> on behalf of Latham, Robert J. <robl@mcs.anl.gov><br>
<b>Sent:</b> Tuesday, April 20, 2021 12:11 PM<br>
<b>To:</b> darshan-users@lists.mcs.anl.gov <darshan-users@lists.mcs.anl.gov>; tobias.meisel@ufz.de <tobias.meisel@ufz.de><br>
<b>Subject:</b> Re: [Darshan-users] HDF5 chunking and Darshan with enable-hdf5-mod results in error</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">On Mon, 2021-04-19 at 14:17 +0200, Tobias Meisel wrote:<br>
> Hi all,<br>
> I’ve built 3.2.1 Darshan with --enable-hdf5-mod=<br>
> <br>
> <br>
> When I began writing chunks collectively with hdf5 I got errors only<br>
> when darshan instrumentation is enabled.<br>
> The darshan instrumentation without hdf5 chunking worked fine (also<br>
> with --enable-hdf5-mod).<br>
<br>
The errors are two kinds:<br>
- "H5Shyper.c line 12116 in H5Sget_regular_hyperslab(): not a hyperslab<br>
selection"<br>
-- Gerd Heber says some processes are calling H5Sselect_none which is<br>
not a "regular hyperslab selection". We've seen this warning before in<br>
other workloads this week.<br>
<br>
But there's a more alarming error -- not just a warning but a divide by<br>
zero<br>
<br>
[archlinux:31445] *** Process received signal ***<br>
[archlinux:31445] Signal: Floating point exception (8)<br>
[archlinux:31445] Signal code: Integer divide-by-zero (1)<br>
[archlinux:31445] Failing at address: 0x7f3f980614bf<br>
[archlinux:31445] [ 0]<br>
/usr/lib/libpthread.so.0(+0x13960)[0x7f3f99bbd960]<br>
[archlinux:31445] [ 1]<br>
/usr/lib/openmpi/openmpi/mca_io_ompio.so(mca_io_ompio_file_get_byte_off<br>
set+0x3f)[0x7f3f980614bf]<br>
[archlinux:31445] [ 2]<br>
/usr/lib/openmpi/libmpi.so.40(PMPI_File_get_byte_offset+0x70)[0x7f3f99f<br>
745b0]<br>
[archlinux:31445] [ 3]<br>
/usr/local/lib/libdarshan.so(MPI_File_write_at_all+0x197)[0x7f3f9a4b76b<br>
7]<br>
<br>
Darshan doesn't divide anything by anything in its wrappers, but<br>
OpenMPI does:<br>
<br>
<a href="https://github.com/open-mpi/ompi/blob/master/ompi/mca/io/ompio/io_ompio_file_open.c#L511">https://github.com/open-mpi/ompi/blob/master/ompi/mca/io/ompio/io_ompio_file_open.c#L511</a><br>
<br>
Lots of questions here: <br>
- OpenMPI-IO shouldn't ever crash based on user input, but it does.<br>
- How is Darshan feeding OpenMPI-IO such a bogus payload?<br>
- How is the hyperslab selection, or lack thereof, triggering all this?<br>
<br>
==rob<br>
<br>
> I have opened a topic at the HDF group forum: <br>
> <br>
<a href="https://forum.hdfgroup.org/t/parallel-hdf5-write-with-irregular-size-in-one-dimension/8284/5">https://forum.hdfgroup.org/t/parallel-hdf5-write-with-irregular-size-in-one-dimension/8284/5</a><br>
> The resulting errors are posted to this topic as well.<br>
> <br>
> The minimal example to reproduce the error is also here: <br>
> <br>
<a href="https://dbkt.hdfgroup.org/original/2X/c/c58be4df192333b6d15d6e91d58b114b85cea2f4.cc">https://dbkt.hdfgroup.org/original/2X/c/c58be4df192333b6d15d6e91d58b114b85cea2f4.cc</a><br>
> <br>
> My Setup is:<br>
> OpenMPI (OpenRTE) 4.0.5<br>
> HDF5 1.12.0<br>
> Darshan 3.2.1<br>
> <br>
> Could you take a look and check if there is a problem with the HDF5<br>
> instrumentation in Darshan?<br>
> <br>
> Thank you<br>
> <br>
> Tobias<br>
> _______________________________________________<br>
> Darshan-users mailing list<br>
> Darshan-users@lists.mcs.anl.gov<br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/darshan-users">https://lists.mcs.anl.gov/mailman/listinfo/darshan-users</a><br>
_______________________________________________<br>
Darshan-users mailing list<br>
Darshan-users@lists.mcs.anl.gov<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/darshan-users">https://lists.mcs.anl.gov/mailman/listinfo/darshan-users</a><br>
</div>
</span></font></div>
</body>
</html>