Big file support on BG with 32-bit off_t type
Robert Latham
robl at mcs.anl.gov
Tue Jan 9 15:12:40 CST 2007
On Mon, Jan 08, 2007 at 01:54:23PM -0700, John Michalakes wrote:
> Working on Blue Gene, we have a code that uses pnetcdf to
> successfully write a large file (> 2GB, file type "2"). The pnetcdf
> library and code are compiled with 32-bit addressing so that
> sizeof(off_t) is 4. The file appears to be correct based on persual
> with ncdump.
Great. Glad to hear at least the file creation part of your code is
working better now.
> We are unable to read the file back in, however, using pnetcdf with
> another program also compiled with 32-bit addressing. The error
> return is: NC_ESMALL.
>
> I believe the section of pnetcdf code returning the error is
> ncmpii_hdr_get_NC, in header.c:
>
> /* check version number in last byte of magic */
> if (magic[sizeof(ncmagic)-1] == 0x1) {
> getbuf.version = 1;
> } else if (magic[sizeof(ncmagic)-1] == 0x2) {
> getbuf.version = 2;
> fSet(ncp->flags, NC_64BIT_OFFSET);
> if (sizeof(off_t) != 8) {
> /* take the easy way out: if we can't support all CDF-2
> * files, return immediately */
> free(getbuf.base);
> return NC_ESMALL;
>
> This is around line 1142.
>
> Question: is this error condition correct and necessary? How does
> 32-bit pnetcdf manage to *write* the large file (apparently)
> correctly?
Hm, that is indeed probably a bug: we should check for
sizof(MPI_Offset), not off_t. Do you still have the
src/lib/ncconfig.h file handy? Does it have lines like this?
/* The number of bytes in an MPI_Offset */
#define SIZEOF_MPI_OFFSET 8
> Will pnetcdf compiled for OBJECT_MODE 64 work properly on Blue Gene?
I think that it would, but I also think it should work ok as-is.
Here's a quick fix. If ncconfig.h already says SIZEOF_MPI_OFFSET is
8, then can you try changing this line:
if (sizeof(off_t) != 8) {
to this
if (sizeof(MPI_Offset) != 8) {
If that works, then I'll clean up the approach a bit and add it to the
next release.
Thanks for the report. It's great to know that the CDF-2 format is
getting used.
==rob
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
More information about the parallel-netcdf
mailing list