MPI_Type_create_hindexed errors from PNetCDF 1.7.0

Wei-keng Liao wkliao at eecs.northwestern.edu
Sun Apr 24 00:55:11 CDT 2016


Hi, Carl

I can reproduce the error message you are seeing.
After some digging, I found the cause comes from OpenMPI.

That PnetCDF test program will internally make a call to
MPI_Type_create_hindexed to create a filetype of zero length and
pass it to MPI_File_set_view which flattens the filetype without
checking whether the length is zero and hence generates the error message.

FYI. MPICH handles this case just fine.

I have added a check in PnetCDF to skip that calling MPI_Type_create_hindexed
to avoid this problem. See the SVN revision 2406.
http://trac.mcs.anl.gov/projects/parallel-netcdf/changeset/2406

Thanks for reporting the error.

Wei-keng

On Apr 23, 2016, at 6:25 PM, Carl Ponder wrote:

> On 05/27/2015 05:41 PM, Carl Ponder wrote:
>> Here are some oddities from the "make check" that I ran:
>> ./nf_test       -d .
>> rank 0: MPI error (MPI_File_delete) : MPI_ERR_IO: input/output error
>> rank 0: MPI error (MPI_File_delete) : MPI_ERR_IO: input/output error
>> *** TESTING F77 ./nf_test for CDF-1                                ------ pass
> On 05/27/2015 06:41 PM, Wei-keng Liao wrote
>> Those MPI error messages are expected if you are using OpenMPI or older versions of MPICH.
>> This error happens when trying to delete a non-existing file.
>> A correct MPI implementation should return MPI_ERR_NO_SUCH_FILE, instead of MPI_ERR_IO.
>> This has been fixed in the MPI-IO part of the latest MPICH release.
>> OpenMPI is copying the MPI-IO part of MPICH and has not updated to the latest yet.
>> You can safely ignore those error messages.
>> 
> With the 1.7.0 release of PNetCDF, I've started getting there errors from the "make check":
> ./nctst        ./testfile.nc
> [ivb106:6511] *** An error occurred in MPI_Type_create_hindexed
> [ivb106:6511] *** reported by process [46912098861057,46909632806912]
> [ivb106:6511] *** on communicator MPI_COMM_WORLD
> [ivb106:6511] *** MPI_ERR_ARG: invalid argument of some other kind
> [ivb106:6511] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> [ivb106:6511] ***    and potentially your MPI job)
> *** TESTING C++ ./nctst for APIs with different netCDF formats     ------ make[2]: [testing] Error 13 (ignored)
> 
> ./tst_f90      ./testfile.nc
> [ivb106:7079] *** An error occurred in MPI_Type_create_hindexed
> [ivb106:7079] *** reported by process [46912052199425,46909632806912]
> [ivb106:7079] *** on communicator MPI_COMM_WORLD
> [ivb106:7079] *** MPI_ERR_ARG: invalid argument of some other kind
> [ivb106:7079] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> [ivb106:7079] ***    and potentially your MPI job)
> make[2]: [testing] Error 13 (ignored)
> It looks like the 1.6.1 release passed the same:
> ./nctst        ./testfile.nc
> *** TESTING C++ ./nctst for APIs with different netCDF formats     ------ pass
> 
> ./tst_f90      ./testfile.nc
> *** TESTING F90 ./tst_f90                                          ------ pass
> Can you tell me what the issue is here? I'm using OpenMPI 1.10.2 in both cases.
> The error happens when I use the Intel 15.0, GCC 4.8.5 and PGI 16.4 compilers, so I don't believe the compiler is the problem.
> Thanks,
> 
>                     Carl Ponder
> 
> 
> This email message is for the sole use of the intended recipient(s) and may contain confidential information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.



More information about the parallel-netcdf mailing list