PNetCDF problem

HLung at us.fujitsu.com HLung at us.fujitsu.com
Wed Nov 11 18:42:54 CST 2015


Hi,

This is Han Lung, Director of HPC Group, Fujitsu America, Inc.

I am working on WRF 3.7+PNetcdf 1.3.0 and got some errors.  The test case is conus12km.

I was told by wrfhelp that this error is related to internal of PNetCDF, so I am writing to you and hope I can get some help from you.

I traced the error to ncmpix_put_size_t(void  **xpp, const MPI_Offset   lp, int sizeof_t) in ncx.c of parallel-netcdf-1.3.0.  Here is the part of the code that caused the error:


#ifdef WORDS_BIGENDIAN
        MPI_Offset *ptr = (MPI_Offset*) (*xpp); /* typecast to 8-byte integer */
        *ptr = lp;    <== error here
#else
        ......

This is an operation for 8-byte integer lp (sizeof_t = 8).  However, *xpp is incremented each time by 4 or 8, depending on sizeof_t:  *xpp  = (void *)((char *)(*xpp) + sizeof_t);

Now when *xpp is not on 8-byte boundary (due to previous operation of 4-byte increment) the operation "*ptr = lp;" will cause the address not aligned error (BUS_ADRALN).

In my case, for most of the time, sizeof_t is 4, and there is no problem.  The first two times when sizeof_t is 8, *xpp is 591725472 and 591725720, respectively, which are at 8-byte boundary, so there is no problem, either.  The third time when sizeof_t is 8, *xpp is 591725972, which is not at 8-byte boundary, and that caused the problem.

I modified ncx.c to pad a 4-byte space if it's not on 8-byte boundary.  It did solve this mis-alignment problem but died at the free() call later.  I am also not sure if this padding is a right approach since the padded parts are all garbage.

Do you see this kind of error before?  Any advice on how to resolve it?

Thanks,

Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20151112/c5216918/attachment.html>


More information about the parallel-netcdf mailing list