[mpich2-dev] 32-bit int overflow in ROMIO
Jeff Parker
jjparker at us.ibm.com
Tue Mar 4 15:32:28 CST 2008
We recently debugged a problem that appeared while running b_eff_io on Blue
Gene/P. It was a core dump caused by an out of bounds array index in ROMIO
module src/mpi/romio/adio/common/ad_read_coll.c that grew large due to a
loop termination condition not being satisfied. The condition was checking
two "int" variables (i < bufsize) and the loop incremented i by another
int, frd_size, which had an out of bounds value of 0x80000000 (2GB).
Looking in the 1.0.7rc1 version of this module, it appears that overflowed
int variables such as the above can occur. For example, on lines 342-344,
several variables are being added together, casted to an int, and stored
into an int. This will overflow when the addition goes beyond 2 GB, which
it will when working with large files.
342 frd_size = (int) (disp +
flat_file->indices[i] +
343 (ADIO_Offset)
n_filetypes*filetype_extent
344 + flat_file->blocklens[i] -
offset);
frd_size and flat_file->blocklens[i] are both ints, which are 32-bits
signed values on most 64 and 32 bit platforms.
The rest of the variables, disp, flat_file->indices[i], ADIO_Offset)
n_filetypes*filetype_extent, and offset, are all ADIO_Offset types, which
are 64 bits.
We were under the impression that the problems in MPICH2 and ROMIO
supporting large files and datatypes were scoped to 32-bit platforms.
However, code of the kind shown above will have problems on 64-bit
platforms too when the int data type is 32 bits.
Is this observation correct?
How common is it to have 32-bit ints on a 64-bit platform?
Jeff Parker
Blue Gene Messaging
61L/030-2 A407 507-253-4208 TieLine: 553-4208
Notes email: Jeff Parker/Rochester/IBM
INTERNET: jjparker at us.ibm.com AFS: jeff at rchland
More information about the mpich2-dev
mailing list