Unable to pass all the tests with pnetcdf 1.6.1, Intel 15.0.3.048 and Mvapich2 2.1
Rob Latham
robl at mcs.anl.gov
Tue Sep 22 16:17:43 CDT 2015
On 09/22/2015 04:10 PM, Wei-keng Liao wrote:
> Hi, Craig
>
> From these outputs, I think it is most likely due to MPI-IO fails
> to return the same file striping unit and factor values among all
> MPI processes. I guess only root process gets the correct values.
> Attached is a short MPI program to test this theory.
> Could you test it using at least 2 processes on Lustre?
>
> To compile:
> mpicc -o check_mpi_striping check_mpi_striping.c
> To run:
> mpiexec -n 2 check_mpi_striping
>
>
what is supposed to happen is that rank 0 sets the striping factor
according to the hints all processes requested (it is erroneous to
specify different values for an MPI-IO hint on different processes).
Rank 0 opens the file and sets the ioctls, then, everyone calls
ioctl(fd->fd_sys, LL_IOC_LOV_GETSTRIPE, (void *)lum);
It sounds like perhaps I need to learn more about Lustre's rules for
when LL_IOC_LOV_SETSTRIPE is visible to all processes.
==rob
> Wei-keng
>
>
>
> On Sep 22, 2015, at 2:34 PM, Craig Tierney - NOAA Affiliate wrote:
>
>> Wei-keng,
>>
>> Here is the output from my run with PNETCDF_SAFE_MODE=1 on Lustre:
>>
>> [root at Jet:fe8 FLASH-IO]# mpiexec.hydra -env PNETCDF_SAFE_MODE 1 -np 4 ./flash_benchmark_io /lfs2/jetmgmt/Craig.Tierney/d1//flash_io_test_
>> Warning (inconsistent metadata): variable lrefine's begin (root=1048576, local=3072)
>> Warning (inconsistent metadata): variable nodetype's begin (root=2097152, local=4608)
>> Warning (inconsistent metadata): variable gid's begin (root=3145728, local=6144)
>> Warning (inconsistent metadata): variable coordinates's begin (root=4194304, local=25600)
>> Warning (inconsistent metadata): variable blocksize's begin (root=5242880, local=33792)
>> Warning (inconsistent metadata): variable bndbox's begin (root=6291456, local=41984)
>> Warning (inconsistent metadata): variable dens's begin (root=7340032, local=57856)
>> Warning (inconsistent metadata): variable velx's begin (root=18874368, local=10641920)
>> Warning (inconsistent metadata): variable lrefine's begin (root=1048576, local=3072)
>> Warning (inconsistent metadata): variable nodetype's begin (root=2097152, local=4608)
>> Warning (inconsistent metadata): variable gid's begin (root=3145728, local=6144)
>> Warning (inconsistent metadata): variable coordinates's begin (root=4194304, local=25600)
>> Warning (inconsistent metadata): variable blocksize's begin (root=5242880, local=33792)
>> Warning (inconsistent metadata): variable bndbox's begin (root=6291456, local=41984)
>> Warning (inconsistent metadata): variable dens's begin (root=7340032, local=57856)
>> Warning (inconsistent metadata): variable velx's begin (root=18874368, local=10641920)
>> Warning (inconsistent metadata): variable vely's begin (root=30408704, local=21225984)
>> Warning (inconsistent metadata): variable velz's begin (root=41943040, local=31810048)
>> Warning (inconsistent metadata): variable pres's begin (root=53477376, local=42394112)
>> Warning (inconsistent metadata): variable ener's begin (root=65011712, local=52978176)
>> Warning (inconsistent metadata): variable temp's begin (root=76546048, local=63562240)
>> Warning (inconsistent metadata): variable gamc's begin (root=88080384, local=74146304)
>> Warning (inconsistent metadata): variable game's begin (root=99614720, local=84730368)
>> Warning (inconsistent metadata): variable enuc's begin (root=111149056, local=95314432)
>> Warning (inconsistent metadata): variable gpot's begin (root=122683392, local=105898496)
>> Warning (inconsistent metadata): variable f1__'s begin (root=134217728, local=116482560)
>> Warning (inconsistent metadata): variable f2__'s begin (root=145752064, local=127066624)
>> Warning (inconsistent metadata): variable f3__'s begin (root=157286400, local=137650688)
>> Warning (inconsistent metadata): variable lrefine's begin (root=1048576, local=3072)
>> Warning (inconsistent metadata): variable nodetype's begin (root=2097152, local=4608)
>> Warning (inconsistent metadata): variable gid's begin (root=3145728, local=6144)
>> Warning (inconsistent metadata): variable coordinates's begin (root=4194304, local=25600)
>> Warning (inconsistent metadata): variable blocksize's begin (root=5242880, local=33792)
>> Warning (inconsistent metadata): variable bndbox's begin (root=6291456, local=41984)
>> Warning (inconsistent metadata): variable dens's begin (root=7340032, local=57856)
>> Warning (inconsistent metadata): variable velx's begin (root=18874368, local=10641920)
>> Warning (inconsistent metadata): variable vely's begin (root=30408704, local=21225984)
>> Warning (inconsistent metadata): variable velz's begin (root=41943040, local=31810048)
>> Warning (inconsistent metadata): variable pres's begin (root=53477376, local=42394112)
>> Warning (inconsistent metadata): variable ener's begin (root=65011712, local=52978176)
>> Warning (inconsistent metadata): variable temp's begin (root=76546048, local=63562240)
>> Warning (inconsistent metadata): variable gamc's begin (root=88080384, local=74146304)
>> Warning (inconsistent metadata): variable game's begin (root=99614720, local=84730368)
>> Warning (inconsistent metadata): variable enuc's begin (root=111149056, local=95314432)
>> Warning (inconsistent metadata): variable gpot's begin (root=122683392, local=105898496)
>> Warning (inconsistent metadata): variable f1__'s begin (root=134217728, local=116482560)
>> Warning (inconsistent metadata): variable f2__'s begin (root=145752064, local=127066624)
>> Warning (inconsistent metadata): variable f3__'s begin (root=157286400, local=137650688)
>> Warning (inconsistent metadata): variable f4__'s begin (root=168820736, local=148234752)
>> Warning (inconsistent metadata): variable f5__'s begin (root=180355072, local=158818816)
>> Warning (inconsistent metadata): variable f6__'s begin (root=191889408, local=169402880)
>> Warning (inconsistent metadata): variable vely's begin (root=30408704, local=21225984)
>> Warning (inconsistent metadata): variable velz's begin (root=41943040, local=31810048)
>> Warning (inconsistent metadata): variable pres's begin (root=53477376, local=42394112)
>> Warning (inconsistent metadata): variable ener's begin (root=65011712, local=52978176)
>> Warning (inconsistent metadata): variable temp's begin (root=76546048, local=63562240)
>> Warning (inconsistent metadata): variable gamc's begin (root=88080384, local=74146304)
>> Warning (inconsistent metadata): variable game's begin (root=99614720, local=84730368)
>> Warning (inconsistent metadata): variable enuc's begin (root=111149056, local=95314432)
>> Warning (inconsistent metadata): variable gpot's begin (root=122683392, local=105898496)
>> Warning (inconsistent metadata): variable f1__'s begin (root=134217728, local=116482560)
>> Warning (inconsistent metadata): variable f2__'s begin (root=145752064, local=127066624)
>> Warning (inconsistent metadata): variable f3__'s begin (root=157286400, local=137650688)
>> Warning (inconsistent metadata): variable f4__'s begin (root=168820736, local=148234752)
>> Warning (inconsistent metadata): variable f5__'s begin (root=180355072, local=158818816)
>> Warning (inconsistent metadata): variable f6__'s begin (root=191889408, local=169402880)
>> Warning (inconsistent metadata): variable f7__'s begin (root=203423744, local=179986944)
>> Warning (inconsistent metadata): variable f8__'s begin (root=214958080, local=190571008)
>> Warning (inconsistent metadata): variable f9__'s begin (root=226492416, local=201155072)
>> Warning (inconsistent metadata): variable f10_'s begin (root=238026752, local=211739136)
>> Warning (inconsistent metadata): variable f11_'s begin (root=249561088, local=222323200)
>> Warning (inconsistent metadata): variable f12_'s begin (root=261095424, local=232907264)
>> Warning (inconsistent metadata): variable f13_'s begin (root=272629760, local=243491328)
>> Warning (inconsistent metadata): variable f4__'s begin (root=168820736, local=148234752)
>> Warning (inconsistent metadata): variable f5__'s begin (root=180355072, local=158818816)
>> Warning (inconsistent metadata): variable f6__'s begin (root=191889408, local=169402880)
>> Warning (inconsistent metadata): variable f7__'s begin (root=203423744, local=179986944)
>> Warning (inconsistent metadata): variable f8__'s begin (root=214958080, local=190571008)
>> Warning (inconsistent metadata): variable f9__'s begin (root=226492416, local=201155072)
>> Warning (inconsistent metadata): variable f10_'s begin (root=238026752, local=211739136)
>> Warning (inconsistent metadata): variable f11_'s begin (root=249561088, local=222323200)
>> Warning (inconsistent metadata): variable f12_'s begin (root=261095424, local=232907264)
>> Warning (inconsistent metadata): variable f13_'s begin (root=272629760, local=243491328)
>> Warning (inconsistent metadata): variable f7__'s begin (root=203423744, local=179986944)
>> Warning (inconsistent metadata): variable f8__'s begin (root=214958080, local=190571008)
>> Warning (inconsistent metadata): variable f9__'s begin (root=226492416, local=201155072)
>> Warning (inconsistent metadata): variable f10_'s begin (root=238026752, local=211739136)
>> Warning (inconsistent metadata): variable f11_'s begin (root=249561088, local=222323200)
>> Warning (inconsistent metadata): variable f12_'s begin (root=261095424, local=232907264)
>> Warning (inconsistent metadata): variable f13_'s begin (root=272629760, local=243491328)
>> Here: -250
>> Here: -262
>> Here: -262
>> Here: -262
>> nfmpi_enddefFile header is inconsistent among processes
>> nfmpi_enddef
>> (Internal error) beginning file offset of this variable is inconsistent among p
>> r
>> nfmpi_enddef
>> (Internal error) beginning file offset of this variable is inconsistent among p
>> r
>> nfmpi_enddef
>> (Internal error) beginning file offset of this variable is inconsistent among p
>> r
>> [cli_1]: aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
>> [cli_0]: [cli_2]: aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
>> [cli_3]: aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
>> aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
>>
>> Craig
>>
>> On Mon, Sep 21, 2015 at 1:21 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>>
>> It is strange that the test failed for Lustre.
>>
>> The error message says some variables defined across MPI processes are not consistent.
>> Could you run this benchmark with safe mode on? by setting the environment variable
>> PNETCDF_SAFE_MODE to 1 before the run. This will print more error messages, such as
>> which variables are inconsistent and at what offsets.
>>
>>
>> Wei-keng
>>
>> On Sep 21, 2015, at 1:31 PM, Craig Tierney - NOAA Affiliate wrote:
>>
>> > Rob and Wei-keng,
>> >
>> > Thanks for you help on this problem. Rob - The patch seems to work. I had to hand apply it but now the pnetcdf tests (mostly) complete successfully. The FLASH-IO benchmark is failing when Lustre is used. It completes successfully when Panasas is used.The error code that is returned by nfmpi_enddef is -262. The
> description for this error is:
>> >
>> > #define NC_EMULTIDEFINE_VAR_BEGIN (-262) /**< inconsistent variable file begin offset (internal use) */
>> >
>> > [root at Jet:fe7 FLASH-IO]# mpiexec.hydra -n 4 ./flash_benchmark_io /pan2/jetmgmt/Craig.Tierney/pan_flash_io_test_
>> > Here: 0
>> > Here: 0
>> > Here: 0
>> > Here: 0
>> > number of guards : 4
>> > number of blocks : 80
>> > number of variables : 24
>> > checkpoint time : 12.74 sec
>> > max header : 0.88 sec
>> > max unknown : 11.83 sec
>> > max close : 0.53 sec
>> > I/O amount : 242.30 MiB
>> > plot no corner : 2.38 sec
>> > max header : 0.59 sec
>> > max unknown : 1.78 sec
>> > max close : 0.22 sec
>> > I/O amount : 20.22 MiB
>> > plot corner : 2.52 sec
>> > max header : 0.81 sec
>> > max unknown : 1.51 sec
>> > max close : 0.96 sec
>> > I/O amount : 24.25 MiB
>> > -------------------------------------------------------
>> > File base name : /pan2/jetmgmt/Craig.Tierney/pan_flash_io_test_
>> > file striping count : 0
>> > file striping size : 301346992 bytes
>> > Total I/O amount : 286.78 MiB
>> > -------------------------------------------------------
>> > nproc array size exec (sec) bandwidth (MiB/s)
>> > 4 16 x 16 x 16 17.64 16.26
>> >
>> >
>> > [root at Jet:fe7 FLASH-IO]# mpiexec.hydra -n 4 ./flash_benchmark_io /lfs2/jetmgmt/Craig.Tierney/lfs_flash_io_test_
>> > Here: -262
>> > Here: -262
>> > Here: -262
>> > nfmpi_enddef
>> > (Internal error) beginning file offset of this variable is inconsistent among p
>> > r
>> > nfmpi_enddef
>> > (Internal error) beginning file offset of this variable is inconsistent among p
>> > r
>> > nfmpi_enddef
>> > (Internal error) beginning file offset of this variable is inconsistent among p
>> > r
>> > Here: 0
>> > [cli_1]: aborting job:
>> > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
>> > [cli_3]: [cli_2]: aborting job:
>> > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
>> > aborting job:
>> > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
>> >
>> > ===================================================================================
>> > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > = PID 16702 RUNNING AT fe7
>> > = EXIT CODE: 255
>> > = CLEANING UP REMAINING PROCESSES
>> > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> > ===================================================================================
>> >
>> > Thanks,
>> > Craig
>> >
>> >
>> > On Mon, Sep 21, 2015 at 8:30 AM, Rob Latham <robl at mcs.anl.gov> wrote:
>> >
>> >
>> > On 09/20/2015 03:44 PM, Craig Tierney - NOAA Affiliate wrote:
>> > Wei-keng,
>> >
>> > I tried your test code on a different system, and I found it worked with
>> > Intel+mvapich2 (2.1rc1). That system was using Panasas and I was
>> > testing on Lustre. I then tried Panasas on the original machine
>> > (supports both Panasas and Lustre) and I got the correct behavior.
>> >
>> > So the problem somehow related to Lustre. We are using the 2.5.37.ddn
>> > client. Unless you have an obvious answer, I will open this with DDN
>> > tomorrow.
>> >
>> >
>> > Ah, bet I know why this is!
>> >
>> > the Lustre driver and (some versions of the) Panasas driver set their fs-specific hints by opening the file, setting some ioctls, then continuing on without deleting the file.
>> >
>> > In the common case, when we expect the file to show up, no one notices or cares, but in MPI_MODE_EXCL or some other restrictive flags, the file gets created when we did not expect it to -- and that's part of the reason this bug lived on so long.
>> >
>> > I fixed this by moving file manipulations out of the hint parsing path and into the open path (after we check permissions and flags)
>> >
>> > Relevant commit:https://trac.mpich.org/projects/mpich/changeset/92f1c69f0de87f9
>> >
>> > See more details from Darshan, OpenMPI, and MPICH here:
>> > -https://trac.mpich.org/projects/mpich/ticket/2261
>> > -https://github.com/open-mpi/ompi/issues/158
>> > -http://lists.mcs.anl.gov/pipermail/darshan-users/2015-February/000256.html
>> >
>> > ==rob
>> >
>> >
>> > Thanks,
>> > Craig
>> >
>> > On Sun, Sep 20, 2015 at 2:36 PM, Craig Tierney - NOAA Affiliate
>> > <craig.tierney at noaa.gov <mailto:craig.tierney at noaa.gov>> wrote:
>> >
>> > Wei-keng,
>> >
>> > Thanks for the test case. Here is what I get using a set of
>> > compilers and MPI stacks. I was expecting that mvapich2 1.8 and 2.1
>> > would behave differently.
>> >
>> > What versions of MPI do you test internally?
>> >
>> > Craig
>> >
>> > Testing intel+impi
>> >
>> > Currently Loaded Modules:
>> > 1) newdefaults 2) intel/15.0.3.187 <http://15.0.3.187> 3)
>> > impi/5.1.1.109 <http://5.1.1.109>
>> >
>> > Error at line 22: File does not exist, error stack:
>> > ADIOI_NFS_OPEN(69): File /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
>> > <http://tooth-fairy.nc> does not exist
>> > Testing intel+mvapich2 2.1
>> >
>> > Currently Loaded Modules:
>> > 1) newdefaults 2) intel/15.0.3.187 <http://15.0.3.187> 3)
>> > mvapich2/2.1
>> >
>> > file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
>> > <http://tooth-fairy.nc>
>> > Testing intel+mvapich2 1.8
>> >
>> > Currently Loaded Modules:
>> > 1) newdefaults 2) intel/15.0.3.187 <http://15.0.3.187> 3)
>> > mvapich2/1.8
>> >
>> > file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
>> > <http://tooth-fairy.nc>
>> > Testing pgi+mvapich2 2.1
>> >
>> > Currently Loaded Modules:
>> > 1) newdefaults 2) pgi/15.3 3) mvapich2/2.1
>> >
>> > file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
>> > <http://tooth-fairy.nc>
>> > Testing pgi+mvapich2 1.8
>> >
>> > Currently Loaded Modules:
>> > 1) newdefaults 2) pgi/15.3 3) mvapich2/1.8
>> >
>> > file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
>> > <http://tooth-fairy.nc>
>> >
>> > Craig
>> >
>> > On Sun, Sep 20, 2015 at 1:43 PM, Wei-keng Liao
>> > <wkliao at eecs.northwestern.edu <mailto:wkliao at eecs.northwestern.edu>>
>> > wrote:
>> >
>> > In that case, it is likely mvapich does not perform correctly.
>> >
>> > In PnetCDF, when NC_NOWRITE is used in a call to ncmpi_open,
>> > PnetCDF calls a MPI_File_open with the open flag set to
>> > MPI_MODE_RDONLY. See
>> >http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/tags/v1-6-1/src/lib/mpincio.c#L322
>> >
>> > Maybe test this with a simple MPI-IO program below.
>> > It prints error messages like
>> > Error at line 15: File does not exist, error stack:
>> > ADIOI_UFS_OPEN(69): File tooth-fairy.nc
>> > <http://tooth-fairy.nc> does not exist
>> >
>> > But, no file should be created.
>> >
>> >
>> > #include <stdio.h>
>> > #include <unistd.h> /* unlink() */
>> > #include <mpi.h>
>> >
>> > int main(int argc, char **argv) {
>> > int err;
>> > MPI_File fh;
>> >
>> > MPI_Init(&argc, &argv);
>> >
>> > /* delete "tooth-fairy.nc <http://tooth-fairy.nc>" and
>> > ignore the error */
>> > unlink("tooth-fairy.nc <http://tooth-fairy.nc>");
>> >
>> > err = MPI_File_open(MPI_COMM_WORLD, "tooth-fairy.nc
>> > <http://tooth-fairy.nc>", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
>> > if (err != MPI_SUCCESS) {
>> > int errorStringLen;
>> > char errorString[MPI_MAX_ERROR_STRING];
>> > MPI_Error_string(err, errorString, &errorStringLen);
>> > printf("Error at line %d: %s\n",__LINE__, errorString);
>> > }
>> > else
>> > MPI_File_close(&fh);
>> >
>> > MPI_Finalize();
>> > return 0;
>> > }
>> >
>> >
>> > Wei-keng
>> >
>> > On Sep 20, 2015, at 1:51 PM, Craig Tierney - NOAA Affiliate wrote:
>> >
>> > > Wei-keng,
>> > >
>> > > I always run distclean before I try to build the code. The
>> > first test failing is nc_test. The problem seems to be in this
>> > test:
>> > >
>> > > err = ncmpi_open(comm, "tooth-fairy.nc
>> > <http://tooth-fairy.nc>", NC_NOWRITE, info, &ncid);/* should fail */
>> > > IF (err == NC_NOERR)
>> > > error("ncmpi_open of nonexistent file should have
>> > failed");
>> > > IF (err != NC_ENOENT)
>> > > error("ncmpi_open of nonexistent file should have
>> > returned NC_ENOENT");
>> > > else {
>> > > /* printf("Expected error message complaining: \"File
>> > tooth-fairy.nc <http://tooth-fairy.nc> does not exist\"\n"); */
>> > > nok++;
>> > > }
>> > >
>> > > A zero length tooth-fairy.nc <http://tooth-fairy.nc> file is
>> > being created, and I don't think that is supposed to happen.
>> > That would mean that the mode NC_NOWRITE is not being honored by
>> > MPI_IO. I will look at this more tomorrow and try to craft a
>> > short example.
>> > >
>> > > Craig
>> > >
>> > > On Sun, Sep 20, 2015 at 10:23 AM, Wei-keng Liao
>> > <wkliao at eecs.northwestern.edu
>> > <mailto:wkliao at eecs.northwestern.edu>> wrote:
>> > > Hi, Craig
>> > >
>> > > Your config.log looks fine to me.
>> > > Some of your error messages are supposed to report errors of
>> > opening
>> > > a non-existing file, but report a different error code,
>> > meaning the
>> > > file does exist. I suspect it may be because of residue files.
>> > >
>> > > Could you do a clean rebuild with the following commands?
>> > > % make -s distclean
>> > > % ./configure --prefix=/apps/pnetcdf/1.6.1-intel-mvapich2
>> > > % make -s -j8
>> > > % make -s check
>> > >
>> > > If the problem persists, then it might be because mvapich.
>> > >
>> > > Wei-keng
>> > >
>> >
>> >
>> >
>> >
>> > --
>> > Rob Latham
>> > Mathematics and Computer Science Division
>> > Argonne National Lab, IL USA
>> >
>>
>>
>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the parallel-netcdf
mailing list