Big file support on BG with 32-bit off_t type
Robert Latham
robl at mcs.anl.gov
Thu Jan 18 13:14:33 CST 2007
On Wed, Jan 17, 2007 at 05:30:50PM -0700, John Michalakes wrote:
> Hi Rob
>
> It was good to meet you at ANL last week. Update on the BG testing
> with WRF and pNetCDF. Appears we're still stuck.
>
> Even though we get past the error I wrote you about below by
> modifying header.c to ignore the test for sizeof(off_t) not equal to
> 8, it unfortunately does not work correctly when the running code
> tries reading data from a large file. We get a lot of messages to
> stdout that look like this:
>
> MPI_FILE_SET_VIEW(58): Invalid displacement argument
> 0: MPI_File_set_view error = Invalid argument, error stack
>
> I've poured through the pnetcdf code and managed to convince myself
> that this has to do with the fact that we're trying to use a 32-bit
> offset into one of these big files.
>
> Mike McCraken at SDSC informs me that since the PPc processor on the
> BG systems is 32-bit, there's no way to compile for 64-bit
> addressing as we would on an IBM Power system using -q64 or
> OBJECT_MODE 64 in the environment.
>
> Have you run into this with pnetcdf on the ANL Blue Gene?
Hi John, Michael
If possible, let's try to keep this discussion on the pnetcdf list. I
think it will help out a lot of people following in your footsteps.
We call MPI_File_set_view in 11 places, but only a few of those have
non-zero displacements. In those non-zero displacement cases, we have
a situation where we assign an off_t type to an MPI_Offset. Those
places could be suspect, especially since we've determined that
sizeof(MPI_Offset) is 8 while sizeof(off_t) is 4 on your platform
typedef struct {
...
off_t begin;
} NC_var;
static int
set_var1_fileview(NC* ncp, MPI_File *mpifh, NC_var* varp, const MPI_Offset index[]) {
MPI_Offset offset;
...
offset = varp->begin;
...
So, could it be that we are overflowing off_t?
Could you tell me how well things do when you take src/lib/nc.h and
around line 277 where we declare the NC_var struct, replace 'off_t
begin;' with 'MPI_Offset begin;' ?
Index: src/lib/nc.h
===================================================================
RCS file: /homes/gropp/cvsMaster_z/parallel-netcdf/src/lib/nc.h,v
retrieving revision 1.20
diff -u -w -p -r1.20 nc.h
--- src/lib/nc.h 13 Dec 2006 06:37:49 -0000 1.20
+++ src/lib/nc.h 18 Jan 2007 19:03:23 -0000
@@ -277,7 +277,7 @@ typedef struct {
NC_attrarray attrs;
nc_type type; /* the discriminant */
size_t len; /* the total length originally allocated */
- off_t begin;
+ MPI_Offset begin;
} NC_var;
typedef struct NC_vararray {
This change doesn't mess up 'nc_test' in either CDF-1 or CDF-2 mode,
but I also don't have a file with 2GB+ offsets handy to test for
certain. Your feedback would be most appreciated.
Thanks
==rob
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
More information about the parallel-netcdf
mailing list