Strange behavior in ncmpio_file_set_view
Sjaardema, Gregory D
gdsjaar at sandia.gov
Thu Oct 28 16:24:20 CDT 2021
I am getting a floating point exception core dump down below `ncmpio_file_set_view` with certain compilers…
I’ve been trying to trace it down, but am confused by the code in `get_varm` for the case when one rank has no items to get and the other rank has items to read with a non-unity stride (7 in this case).
This is from a code using netCDF to call down into PnetCDF.
Originally, the variable being read was NC_REQ_INDEP, so the rank with zero items to read would return from `get_varm` at line 464 and the other rank would continue. It would eventually end up in `ncmpio_file_set_view` and call down and finally throw the floating point exception.
Since there is a comment that `MPI_File_set_view` is collective, I figured that might be the issue that only one rank was calling down that path, so changed the variable to be NC_REC_COLL. Both ranks now call down into `ncmpio_file_set_view`, but then inside that routine, the rank with zero bytes to read falls into the first if block `if (filetype == MPI_BYTE)` and the second rank goes down further and hits the next if block `if (rank == 0) `.
Both ranks end up calling a MPI_File_set_view, but with different types. The end result is that I still get a floating point exception on the rank that does have bytes to read. The execption seems to be in `cost_calc`.
This is with pnetcdf-1.12.1, clang-12.0.0 (also with clang-10.0.0) and openmpi-4.0.5.
I’m basically looking for guidance at this point that the calling paths look correct or where to look in more depth… Any help appreciated.
..GReg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20211028/baa0e025/attachment.html>
More information about the parallel-netcdf
mailing list