Big file support on BG with 32-bit off_t type

John Michalakes john at michalakes.us
Fri Jan 19 05:43:54 CST 2007


Hi Rob,

Mike tried the fix you proposed on the SDSC BG machine with the following
result, and assertion failure:

  wrf.exe: header.c:132: ncmpii_NC_computeshapes: Assertion `ncp->begin_var > 0'
failed.

John

> -----Original Message-----
> From: Robert Latham
> Sent: Thursday, January 18, 2007 12:15 PM
> To: John Michalakes
> Cc: Michael McCracken; parallel-netcdf
> Subject: Re: Big file support on BG with 32-bit off_t type
>
[...]
>
> We call MPI_File_set_view in 11 places, but only a few of those have
> non-zero displacements.  In those non-zero displacement cases, we have
> a situation where we assign an off_t type to an MPI_Offset.  Those
> places could be suspect, especially since we've determined that
> sizeof(MPI_Offset) is 8 while sizeof(off_t) is 4 on your platform
>
> typedef struct {
> 	...
>         off_t begin;
> } NC_var;
>
> static int
> set_var1_fileview(NC* ncp, MPI_File *mpifh, NC_var* varp, const
> MPI_Offset index[]) {
>   MPI_Offset offset;
>   ...
>
>   offset = varp->begin;
>   ...
>
>
> So, could it be that we are overflowing off_t?
>
> Could you tell me how well things do when you take src/lib/nc.h and
> around line 277 where we declare the NC_var struct, replace 'off_t
> begin;'  with 'MPI_Offset begin;' ?
>
>
> Index: src/lib/nc.h
> ===================================================================
> RCS file: /homes/gropp/cvsMaster_z/parallel-netcdf/src/lib/nc.h,v
> retrieving revision 1.20
> diff -u -w -p -r1.20 nc.h
> --- src/lib/nc.h        13 Dec 2006 06:37:49 -0000      1.20
> +++ src/lib/nc.h        18 Jan 2007 19:03:23 -0000
> @@ -277,7 +277,7 @@ typedef struct {
>         NC_attrarray attrs;
>         nc_type type;           /* the discriminant */
>         size_t len;             /* the total length originally allocated */
> -       off_t begin;
> +       MPI_Offset begin;
>  } NC_var;
>
>  typedef struct NC_vararray {
>
>
> This change doesn't mess up 'nc_test' in either CDF-1 or CDF-2 mode,
> but I also don't have a file with 2GB+ offsets handy to test for
> certain.  Your feedback would be most appreciated.
>
> Thanks
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
> Argonne National Lab, IL USA                 B29D F333 664A 4280 315B




More information about the parallel-netcdf mailing list