memory issue in nonblocking.c?

Wei-keng Liao wkliao at eecs.northwestern.edu
Fri Oct 27 10:53:11 CDT 2017


FYI. There is a bug recently found that affects the nonblocking APIs.
It happens when the "access region" of one nonblocking request is
completely falls inside the access region of another request (but no
access data in the two regions are overlapped.) Access region here
means the range from starting array index to the end array index.

If this is your case, please try the latest codes from PnetCDF SVN repo.

Wei-keng

On Oct 27, 2017, at 8:25 AM, Jim Edwards wrote:

> It's a cice configuration of the full cesm model, I have not been able to reduce the problem.
> 
> On Thu, Oct 26, 2017 at 8:17 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
> Hi, Jim
> 
> What is the test program?
> 
> Wei-keng
> 
> On Oct 26, 2017, at 8:29 PM, Jim Edwards wrote:
> 
> > I've run out of ideas on how to debug this, hope you can help.
> >
> > I have a job that crashes at line 1000 in nonblocking.c of pnetcdf version 1.8.1
> >
> > I started out managing my own requests but then switched to letting pnetcdf handle them internally, but it still crashes in the same spot.
> >
> > I'm using intel 17.0.1 and mpt (sgi) 2.16
> >
> > Here is a partial trace:
> >
> > 73:MPT: #7  0x0000000001e77732 in ncmpii_del_mem_entry (buf=0x1ad4cc70) at malloc.c:174
> >
> > 73:MPT: #8  0x0000000001e77a90 in NCI_Free_fn (ptr=0x1ad4cc70, lineno=1000,
> >
> > 73:MPT:     func=0x23cda56 <__$U8> 'ncmpii_wait\000',
> >
> > 73:MPT:     filename=0x23cd964 'nonblocking.c\000') at malloc.c:276
> >
> > 73:MPT: #9  0x0000000001e70f6d in ncmpii_wait (ncp=0x1a97dbb0, io_method=1,
> >
> > 73:MPT:     num_reqs=-1, req_ids=0x0, statuses=0x0) at nonblocking.c:1000
> >
> > 73:MPT: #10 0x0000000001e6f10a in ncmpi_wait_all (ncid=7, num_reqs=-1, req_ids=0x0,
> >
> > 73:MPT:     statuses=0x0) at nonblocking.c:494
> >
> >
> > --
> > Jim Edwards
> >
> > CESM Software Engineer
> > National Center for Atmospheric Research
> > Boulder, CO
> 
> 
> 
> 
> -- 
> Jim Edwards
> 
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO 



More information about the parallel-netcdf mailing list