pnetcdf and mvapich2 2.2

Wei-keng Liao wkliao at eecs.northwestern.edu
Fri Feb 3 12:25:43 CST 2017


Hi, Mark

For running "make check" on Lustre, could you please set the environment
variable PNETCDF_HINTS to "nc_header_align_size=512;nc_var_align_size=1"
and run "make check" again? I think it should pass make check. Do let me
know. These errors only occur for file systems whose striping size is
larger than 1. So, ext4 is not affected. I am working on a fix for that
test program. Please note this is a bug in the test program. the PnetCDF
library itself is intact.

When running "make check", I suggest not to set the environment variable
PNETCDF_VERBOSE_DEBUG_MODE, as many error checks are designed on
purpose. Those debugging messages can easily mask the true errors. That
environment variable is designed for testing one program at a time.

As for the errors from mvapich2, I do not have access to a machine with
infiniband and thus could not give it a try. However, the errors look like
a similar issue that has been discovered in OpenMPI recently: fail to
return the correct MPI error codes. I will look into the mvapich2 source
codes to confirm.

Thanks for trying various compilers and reporting the problem !

Wei-keng

On Feb 3, 2017, at 3:31 AM, Mark Dixon wrote:

> Hi,
> 
> Me again - sorry!
> 
> Now trying to build mvapich2 2.2 on centos7 with pnetcdf 1.8.1 with the copy of GCC bundled with the OS. mvapich2 was built with configure options --enable-shared --enable-romio --with-file-system=lustre+ufs
> 
> As with openmpi, I am able to pass tests if the build directory is on an ext4 filesystem but not if it is on a Lustre (2.7.19.8) filesystem.
> 
> I've done a clean build with:
> 
>  MPICC=`which mpicc` MPIF77=`which mpif90` MPIF90=`which mpif90` MPICXX=`which mpicxx` ./configure --prefix="$prefix" --with-mpi=$MPI_HOME --enable-debug
>  make
>  export PNETCDF_VERBOSE_DEBUG_MODE=1
>  make check
> 
> This is the first failure:
> 
> rm -f ./scratch.nc
> rm -f ./testfile.nc
> rm -f ./tooth-fairy.nc
> ./nc_test -c    -d .
> Rank 0: NC_ERANGE error at line 987 of ncmpix_put_NC_SHORT_double in ncx.c
> Rank 0: NC_ERANGE error at line 1933 of ncmpix_put_NC_INT_double in ncx.c
> Rank 0: NC_ERANGE error at line 1933 of ncmpix_put_NC_INT_double in ncx.c
> Rank 0: NC_ERANGE error at line 3081 of ncmpix_put_NC_FLOAT_double in ncx.c
> Rank 0: NC_ERANGE error at line 3081 of ncmpix_put_NC_FLOAT_double in ncx.c
> Rank 0: NC_ERANGE error at line 1933 of ncmpix_put_NC_INT_double in ncx.c
> Rank 0: NC_ERANGE error at line 3081 of ncmpix_put_NC_FLOAT_double in ncx.c
> Rank 0: NC_ERANGE error at line 3081 of ncmpix_put_NC_FLOAT_double in ncx.c
> Rank 0: NC_ERANGE error at line 5470 of ncmpix_putn_NC_BYTE_double in ncx.c
> Rank 0: NC_ERANGE error at line 987 of ncmpix_put_NC_SHORT_double in ncx.c
> Rank 0: NC_ERANGE error at line 1933 of ncmpix_put_NC_INT_double in ncx.c
> Rank 0: NC_ERANGE error at line 3081 of ncmpix_put_NC_FLOAT_double in ncx.c
> Rank 0: NC_ERANGE error at line 5470 of ncmpix_putn_NC_BYTE_double in ncx.c
> Rank 0: NC_ERANGE error at line 5470 of ncmpix_putn_NC_BYTE_double in ncx.c
> Rank 0: NC_ERANGE error at line 987 of ncmpix_put_NC_SHORT_double in ncx.c
> Rank 0: NC_ERANGE error at line 987 of ncmpix_put_NC_SHORT_double in ncx.c
> Rank 0: NC_ERANGE error at line 1933 of ncmpix_put_NC_INT_double in ncx.c
> Rank 0: NC_ERANGE error at line 1933 of ncmpix_put_NC_INT_double in ncx.c
> Rank 0: NC_ERANGE error at line 3081 of ncmpix_put_NC_FLOAT_double in ncx.c
> Rank 0: NC_ERANGE error at line 3081 of ncmpix_put_NC_FLOAT_double in ncx.c
> ./nc_test -d       .
> *** TESTING C   nc_test for format CDF-1                           ------ Rank 0: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 0: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> 
> 	FAILURE at line 122 of test_ncmpi_open in test_read.c: expecting NC_ENOENT or NC_EFILE but got NC_ENOTNCRank 0: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 0: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 0: NC_EPERM error at line 839 of ncmpi_redef in mpinetcdf.c
> 
> 	### 1 FAILURES TESTING test_ncmpi_open! Stop ... ###
> 
> ./nc_test: expects 0 failures ... fail with 1 mismatches
> make[2]: *** [testing] Error 1
> make[2]: Leaving directory `/nobackup/parallel-netcdf-1.8.1/test/nc_test'
> make[1]: *** [check-nc_test] Error 2
> make[1]: Leaving directory `/nobackup/parallel-netcdf-1.8.1/test'
> make: *** [check] Error 2
> 
> Is this a similar problem as the one for openmpi?
> 
> 
> If I run "make ptest", which doesn't seem to halt on errors, it fails on:
> 
> *** TESTING C   dim_cdf12 for defining dim in CDF-1/2 format       ------ Rank 1: NC_EDIMSIZE error at line 407 of ncmpi_def_dim in dim.c
> Rank 2: NC_EDIMSIZE error at line 407 of ncmpi_def_dim in dim.c
> Rank 3: NC_EDIMSIZE error at line 407 of ncmpi_def_dim in dim.c
> Rank 0: NC_EDIMSIZE error at line 407 of ncmpi_def_dim in dim.c
> Rank 0: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 1: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 2: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 3: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 0: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 1: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 2: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 3: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Error at line 96: err=NC_EVARSIZE (NetCDF: One or more variable sizes violate format constraints)
> Error at line 96: err=NC_EVARSIZE (NetCDF: One or more variable sizes violate format constraints)
> Error at line 96: err=NC_EVARSIZE (NetCDF: One or more variable sizes violate format constraints)
> Error at line 96: err=NC_EVARSIZE (NetCDF: One or more variable sizes violate format constraints)
> Rank 0: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 1: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 2: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 3: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Error at line 97: err=NC_ENOTNC (NetCDF: Unknown file format)
> Rank 1: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 2: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 2: NC_EBADID error at line 233 of ncmpii_NC_check_id in nc.c
> Rank 2: NC_EBADID error at line 1177 of ncmpi_close in mpinetcdf.c
> Error at line 97: err=NC_ENOTNC (NetCDF: Unknown file format)
> Error at line 97: err=NC_ENOTNC (NetCDF: Unknown file format)
> Error at line 98: err=NC_EBADID (NetCDF: Not a valid ID)
> Error at line 98: err=NC_EBADID (NetCDF: Not a valid ID)
> Rank 0: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 0: NC_EBADID error at line 233 of ncmpii_NC_check_id in nc.c
> Rank 0: NC_EBADID error at line 1177 of ncmpi_close in mpinetcdf.c
> Rank 1: NC_EBADID error at line 233 of ncmpii_NC_check_id in nc.c
> Rank 1: NC_EBADID error at line 1177 of ncmpi_close in mpinetcdf.c
> Error at line 98: err=NC_EBADID (NetCDF: Not a valid ID)
> Rank 3: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Error at line 97: err=NC_ENOTNC (NetCDF: Unknown file format)
> Error at line 98: err=NC_EBADID (NetCDF: Not a valid ID)
> Rank 3: NC_EBADID error at line 233 of ncmpii_NC_check_id in nc.c
> Rank 3: NC_EBADID error at line 1177 of ncmpi_close in mpinetcdf.c
> Rank 0: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 1: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 2: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 3: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 0: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 1: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 2: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Rank 3: NC_EVARSIZE error at line 489 of NC_begins in nc.c
> Error at line 113: err=NC_EVARSIZE (NetCDF: One or more variable sizes violate format constraints)
> Error at line 113: err=NC_EVARSIZE (NetCDF: One or more variable sizes violate format constraints)
> Error at line 113: err=NC_EVARSIZE (NetCDF: One or more variable sizes violate format constraints)
> Error at line 113: err=NC_EVARSIZE (NetCDF: One or more variable sizes violate format constraints)
> Rank 0: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 1: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 2: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 3: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 0: NC_EDIMSIZE error at line 407 of ncmpi_def_dim in dim.c
> Rank 1: NC_EDIMSIZE error at line 407 of ncmpi_def_dim in dim.c
> Rank 2: NC_EDIMSIZE error at line 407 of ncmpi_def_dim in dim.c
> Rank 3: NC_EDIMSIZE error at line 407 of ncmpi_def_dim in dim.c
> Rank 0: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 1: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 2: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 3: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 0: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 1: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 2: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 3: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 0: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 1: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 2: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> Rank 3: NC_EVARSIZE error at line 1055 of ncmpii_NC_check_vlens in nc.c
> fail with 16 mismatches
> 
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 84444 RUNNING AT login2.arc3.leeds.ac.uk
> =   EXIT CODE: 2
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> make[2]: *** [ptest4] Error 2
> make[2]: Leaving directory `/nobackup/parallel-netcdf-1.8.1/test/cdf_format'
> 
> 
> And also:
> 
> *** TESTING C   modes for file create/open modes                   ------ Rank 0: NC_EINVAL_CMODE error at line 283 of ncmpi_create in mpinetcdf.c
> Rank 1: NC_EINVAL_CMODE error at line 283 of ncmpi_create in mpinetcdf.c
> Rank 2: NC_EINVAL_CMODE error at line 283 of ncmpi_create in mpinetcdf.c
> Rank 3: NC_EINVAL_CMODE error at line 283 of ncmpi_create in mpinetcdf.c
> Rank 0: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 1: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 2: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 3: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 1: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 2: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 3: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Error at line 83: expect error code NC_ENOENT but got NC_ENOTNC
> Error at line 83: expect error code NC_ENOENT but got NC_ENOTNC
> Error at line 83: expect error code NC_ENOENT but got NC_ENOTNC
> Rank 0: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Error at line 83: expect error code NC_ENOENT but got NC_ENOTNC
> Error at line 89: file (./testfile.nc) should not be created
> Rank 0: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 1: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 2: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 3: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Error at line 103: expect error code NC_ENOENT but got NC_ENOTNC
> Error at line 109: file (./testfile.nc) should not be created
> Error at line 103: expect error code NC_ENOENT but got NC_ENOTNC
> Error at line 103: expect error code NC_ENOENT but got NC_ENOTNC
> Rank 0: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 2: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 3: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Error at line 103: expect error code NC_ENOENT but got NC_ENOTNC
> Rank 1: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 0: NC_EINVAL_CMODE error at line 283 of ncmpi_create in mpinetcdf.c
> Rank 1: NC_EINVAL_CMODE error at line 283 of ncmpi_create in mpinetcdf.c
> Rank 2: NC_EINVAL_CMODE error at line 283 of ncmpi_create in mpinetcdf.c
> Rank 3: NC_EINVAL_CMODE error at line 283 of ncmpi_create in mpinetcdf.c
> Rank 0: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 1: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 2: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 2: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 3: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 3: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Error at line 83: expect error code NC_ENOENT but got NC_ENOTNC
> Error at line 89: file (./testfile.nc) should not be created
> Error at line 83: expect error code NC_ENOENT but got NC_ENOTNC
> Error at line 83: expect error code NC_ENOENT but got NC_ENOTNC
> Rank 0: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Error at line 83: expect error code NC_ENOENT but got NC_ENOTNC
> Rank 1: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 0: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 0: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 1: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 1: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 2: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Rank 2: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Rank 3: NC_ENOTNC error at line 1805 of ncmpii_hdr_get_NC in header.c
> Error at line 103: expect error code NC_ENOENT but got NC_ENOTNC
> Error at line 109: file (./testfile.nc) should not be created
> Error at line 103: expect error code NC_ENOENT but got NC_ENOTNC
> Error at line 103: expect error code NC_ENOENT but got NC_ENOTNC
> Rank 3: NC_ENOTNC error at line 576 of ncmpi_open in mpinetcdf.c
> Error at line 103: expect error code NC_ENOENT but got NC_ENOTNC
> fail with 20 mismatches
> 
> 
> 
> I've included my config.log below.
> 
> Any ideas, please?
> 
> Cheers,
> 
> Mark
> 
> 
> 



More information about the parallel-netcdf mailing list