possible bug in prerelease

Jim Edwards jedwards at ucar.edu
Sun Dec 3 07:55:29 CST 2017


I see you already put up the PR to ROMIO - thanks.

On Fri, Dec 1, 2017 at 10:52 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu
> wrote:

> Hi, Jim
>
> After taking another look at your assertion error from ad_gpfs_aggrs.c,
> I believe you were hit by a ROMIO bug. I wrote a short test program that
> can cause a similar integer overflow error in ROMIO. The program's URL:
> https://trac.mcs.anl.gov/projects/parallel-netcdf/
> browser/trunk/test/largefile/large_coalesce.c
>
> Look like the bug has been predicted based on the following comments at
> line 463
> in file ad_gpfs_aggrs.c:
>   /* Possibly reconsider if buf_idx's are ok as int's, or should they be
> aints/offsets?
>      They are used as memory buffer indices so it seems like the 2G limit
> is in effect */
>
> After I rebuilt MPICH by changing the data type of buf_idx from int to
> MPI_Aint,
> my test program ran fine. Would you like to create an github issue at
> MPICH repo?
>
>
> Wei-keng
>
> On Dec 1, 2017, at 8:07 PM, Wei-keng Liao wrote:
>
> > Hi, Jim,
> >
> > Yes, that is a bug. I have developed a fix. Please check out the
> > latest commit from PnetCDF SVN repo and let me know if it works for you.
> > Thanks for reporting.
> >
> > Wei-keng
> >
> > On Dec 1, 2017, at 4:43 PM, Jim Edwards wrote:
> >
> >> I think that I've found a bug in the prerelease in file ncmpio_wait.c
> >>
> >> In coalescing blocklengths at line 2095
> >> ​
> >>            if (ai - a_last_contig == blocklengths[last_contig_req])
> >>                /* user buffer of request j is contiguous from j-1
> >>                 * we coalesce j to j-1 */
> >>                blocklengths[last_contig_req] += blocklengths[j];
> >>
> >> ​It's possible that ​blocklengths[last_contig_req] + blocklengths[j];
> overflows the integer datatype.
> >> I tried to fix that by avoiding the coalescing:
> >>
> >>            if ((ai - a_last_contig == blocklengths[last_contig_req]) &&
> >>              (blocklengths[last_contig_req] + blocklengths[j] > 0))
> >>                /* user buffer of request j is contiguous from j-1
> >>                 * we coalesce j to j-1 */
> >>                blocklengths[last_contig_req] += blocklengths[j];
> >>
> >> ​but that leads to another overflow problem ​:
> >> ad_gpfs_aggrs.c:572: ADIOI_GPFS_Calc_my_req: Assertion `curr_idx ==
> (int) curr_idx' failed.
> >>
> >>
> >>
> >> --
> >> Jim Edwards
> >>
> >> CESM Software Engineer
> >> National Center for Atmospheric Research
> >> Boulder, CO
> >
>
>


-- 
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20171203/2d707525/attachment.html>


More information about the parallel-netcdf mailing list