From wkliao at eecs.northwestern.edu Tue May 3 09:18:45 2016 From: wkliao at eecs.northwestern.edu (Wei-keng Liao) Date: Tue, 3 May 2016 09:18:45 -0500 Subject: MPI_Type_create_hindexed errors from PNetCDF 1.7.0 In-Reply-To: <738D4D52-8051-4EB7-B2FE-B02853C8FD20@eecs.northwestern.edu> References: <5566480D.2020100@nvidia.com> <571C0463.1070608@nvidia.com> <738D4D52-8051-4EB7-B2FE-B02853C8FD20@eecs.northwestern.edu> Message-ID: <329CB6DD-9C27-4B77-8E44-ED24E519E458@eecs.northwestern.edu> Hi, Carl I reported this issue to OpenMPI and they have fixed the bug. FYI, the links to this issue at OpenMPI: https://github.com/open-mpi/ompi/issues/1611 https://github.com/open-mpi/ompi-release/pull/1122 Wei-keng On Apr 24, 2016, at 12:55 AM, Wei-keng Liao wrote: > Hi, Carl > > I can reproduce the error message you are seeing. > After some digging, I found the cause comes from OpenMPI. > > That PnetCDF test program will internally make a call to > MPI_Type_create_hindexed to create a filetype of zero length and > pass it to MPI_File_set_view which flattens the filetype without > checking whether the length is zero and hence generates the error message. > > FYI. MPICH handles this case just fine. > > I have added a check in PnetCDF to skip that calling MPI_Type_create_hindexed > to avoid this problem. See the SVN revision 2406. > http://trac.mcs.anl.gov/projects/parallel-netcdf/changeset/2406 > > Thanks for reporting the error. > > Wei-keng > > On Apr 23, 2016, at 6:25 PM, Carl Ponder wrote: > >> On 05/27/2015 05:41 PM, Carl Ponder wrote: >>> Here are some oddities from the "make check" that I ran: >>> ./nf_test -d . >>> rank 0: MPI error (MPI_File_delete) : MPI_ERR_IO: input/output error >>> rank 0: MPI error (MPI_File_delete) : MPI_ERR_IO: input/output error >>> *** TESTING F77 ./nf_test for CDF-1 ------ pass >> On 05/27/2015 06:41 PM, Wei-keng Liao wrote >>> Those MPI error messages are expected if you are using OpenMPI or older versions of MPICH. >>> This error happens when trying to delete a non-existing file. >>> A correct MPI implementation should return MPI_ERR_NO_SUCH_FILE, instead of MPI_ERR_IO. >>> This has been fixed in the MPI-IO part of the latest MPICH release. >>> OpenMPI is copying the MPI-IO part of MPICH and has not updated to the latest yet. >>> You can safely ignore those error messages. >>> >> With the 1.7.0 release of PNetCDF, I've started getting there errors from the "make check": >> ./nctst ./testfile.nc >> [ivb106:6511] *** An error occurred in MPI_Type_create_hindexed >> [ivb106:6511] *** reported by process [46912098861057,46909632806912] >> [ivb106:6511] *** on communicator MPI_COMM_WORLD >> [ivb106:6511] *** MPI_ERR_ARG: invalid argument of some other kind >> [ivb106:6511] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >> [ivb106:6511] *** and potentially your MPI job) >> *** TESTING C++ ./nctst for APIs with different netCDF formats ------ make[2]: [testing] Error 13 (ignored) >> >> ./tst_f90 ./testfile.nc >> [ivb106:7079] *** An error occurred in MPI_Type_create_hindexed >> [ivb106:7079] *** reported by process [46912052199425,46909632806912] >> [ivb106:7079] *** on communicator MPI_COMM_WORLD >> [ivb106:7079] *** MPI_ERR_ARG: invalid argument of some other kind >> [ivb106:7079] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >> [ivb106:7079] *** and potentially your MPI job) >> make[2]: [testing] Error 13 (ignored) >> It looks like the 1.6.1 release passed the same: >> ./nctst ./testfile.nc >> *** TESTING C++ ./nctst for APIs with different netCDF formats ------ pass >> >> ./tst_f90 ./testfile.nc >> *** TESTING F90 ./tst_f90 ------ pass >> Can you tell me what the issue is here? I'm using OpenMPI 1.10.2 in both cases. >> The error happens when I use the Intel 15.0, GCC 4.8.5 and PGI 16.4 compilers, so I don't believe the compiler is the problem. >> Thanks, >> >> Carl Ponder >> >> >> This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. >