possible bug in prerelease
Wei-keng Liao
wkliao at eecs.northwestern.edu
Fri Dec 1 23:52:46 CST 2017
Hi, Jim
After taking another look at your assertion error from ad_gpfs_aggrs.c,
I believe you were hit by a ROMIO bug. I wrote a short test program that
can cause a similar integer overflow error in ROMIO. The program's URL:
https://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/test/largefile/large_coalesce.c
Look like the bug has been predicted based on the following comments at line 463
in file ad_gpfs_aggrs.c:
/* Possibly reconsider if buf_idx's are ok as int's, or should they be aints/offsets?
They are used as memory buffer indices so it seems like the 2G limit is in effect */
After I rebuilt MPICH by changing the data type of buf_idx from int to MPI_Aint,
my test program ran fine. Would you like to create an github issue at MPICH repo?
Wei-keng
On Dec 1, 2017, at 8:07 PM, Wei-keng Liao wrote:
> Hi, Jim,
>
> Yes, that is a bug. I have developed a fix. Please check out the
> latest commit from PnetCDF SVN repo and let me know if it works for you.
> Thanks for reporting.
>
> Wei-keng
>
> On Dec 1, 2017, at 4:43 PM, Jim Edwards wrote:
>
>> I think that I've found a bug in the prerelease in file ncmpio_wait.c
>>
>> In coalescing blocklengths at line 2095
>>
>> if (ai - a_last_contig == blocklengths[last_contig_req])
>> /* user buffer of request j is contiguous from j-1
>> * we coalesce j to j-1 */
>> blocklengths[last_contig_req] += blocklengths[j];
>>
>> It's possible that blocklengths[last_contig_req] + blocklengths[j]; overflows the integer datatype.
>> I tried to fix that by avoiding the coalescing:
>>
>> if ((ai - a_last_contig == blocklengths[last_contig_req]) &&
>> (blocklengths[last_contig_req] + blocklengths[j] > 0))
>> /* user buffer of request j is contiguous from j-1
>> * we coalesce j to j-1 */
>> blocklengths[last_contig_req] += blocklengths[j];
>>
>> but that leads to another overflow problem :
>> ad_gpfs_aggrs.c:572: ADIOI_GPFS_Calc_my_req: Assertion `curr_idx == (int) curr_idx' failed.
>>
>>
>>
>> --
>> Jim Edwards
>>
>> CESM Software Engineer
>> National Center for Atmospheric Research
>> Boulder, CO
>
More information about the parallel-netcdf
mailing list