[mpich2-dev] 32-bit int overflow in ROMIO

Rajeev Thakur thakur at mcs.anl.gov
Wed Mar 5 10:14:56 CST 2008


This may be a place where a variable needs to be declared as MPI_Offset
instead of int. (I haven't looked at it closely). int is usually 32-bit even
on 64-bit systems.

Rajeev
  

> -----Original Message-----
> From: owner-mpich2-dev at mcs.anl.gov 
> [mailto:owner-mpich2-dev at mcs.anl.gov] On Behalf Of Jeff Parker
> Sent: Tuesday, March 04, 2008 3:32 PM
> To: mpich2-dev at mcs.anl.gov
> Subject: [mpich2-dev] 32-bit int overflow in ROMIO
> 
> 
> We recently debugged a problem that appeared while running 
> b_eff_io on Blue Gene/P.  It was a core dump caused by an out 
> of bounds array index in ROMIO module 
> src/mpi/romio/adio/common/ad_read_coll.c that grew large due 
> to a loop termination condition not being satisfied.  The 
> condition was checking two "int" variables (i < bufsize) and 
> the loop incremented i by another int, frd_size, which had an 
> out of bounds value of 0x80000000 (2GB).
> 
> Looking in the 1.0.7rc1 version of this module, it appears 
> that overflowed int variables such as the above can occur.  
> For example, on lines 342-344, several variables are being 
> added together, casted to an int, and stored into an int.  
> This will overflow when the addition goes beyond 2 GB, which 
> it will when working with large files.
> 
>     342                         frd_size = (int) (disp +
> flat_file->indices[i] +
>     343                             (ADIO_Offset)
> n_filetypes*filetype_extent
>     344                                 + flat_file->blocklens[i] -
> offset);
> 
> frd_size and flat_file->blocklens[i] are both ints, which are 
> 32-bits signed values on most 64 and 32 bit platforms.
> The rest of the variables, disp, flat_file->indices[i], 
> ADIO_Offset) n_filetypes*filetype_extent, and offset, are all 
> ADIO_Offset types, which are 64 bits.
> 
> We were under the impression that the problems in MPICH2 and 
> ROMIO supporting large files and datatypes were scoped to 
> 32-bit platforms.
> However, code of the kind shown above will have problems on 
> 64-bit platforms too when the int data type is 32 bits.
> 
> Is this observation correct?
> 
> How common is it to have 32-bit ints on a 64-bit platform?
> 
> Jeff Parker
> Blue Gene Messaging
> 61L/030-2 A407    507-253-4208    TieLine: 553-4208
> Notes email: Jeff Parker/Rochester/IBM
> INTERNET: jjparker at us.ibm.com     AFS: jeff at rchland
> 
> 




More information about the mpich2-dev mailing list