Unable to pass all the tests with pnetcdf 1.6.1, Intel 15.0.3.048 and Mvapich2 2.1
Wei-keng Liao
wkliao at eecs.northwestern.edu
Tue Sep 22 19:54:44 CDT 2015
Hi, Craig
I have to admit I ran out of ideas.
Let me explain my suspicion about the possible fault on inconsistent striping hints.
One of the error message shown here:
Warning (inconsistent metadata): variable lrefine's begin (root=1048576, local=3072)
says that rank 0 calculates the variable lrefine's starting file offset 1048576
and another process calculates 3072. If Lustre's file striping unit is known, then
PnetCDF will use a number aligned with striping unit for a variable's starting offset.
If file striping unit is not available, then PnetCDF will align it with 512 bytes.
So, in your case, the file stripe unit is 1048576, meaning rank 0 did have a correct
value from MPI-IO hint, but the other process did not.
PnetCDF calls MPI_Info_get() to get striping_unit value and assumes all processes
get the same value returned from the same MPI call.
Do you have MPICH installed on the same machine? If this is also happening to MPICH,
then it is mvapich. If not, then PnetCDF is at fault.
I wonder if you would like to try another test program that is in PnetCDF (attached).
Wei-keng
-------------- next part --------------
A non-text attachment was scrubbed...
Name: check_striping.c
Type: application/octet-stream
Size: 3376 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20150922/1c2b391c/attachment.obj>
-------------- next part --------------
On Sep 22, 2015, at 5:30 PM, Craig Tierney - NOAA Affiliate wrote:
> Wei-keng,
>
> I wasn't able to trigger a problem. Here is the script I ran around your test case:
>
> #!/bin/bash --login
>
> module load newdefaults
> module load intel
> export PATH=/home/admin/software/apps/mvapich2/2.1-intel/bin/:${PATH}
>
> PDIR=/lfs2/jetmgmt/Craig.Tierney/test
>
> if [ ! -d $PDIR/ ]; then
> mkdir $PDIR
> fi
>
> for s in 1 4; do
> if [ ! -d $PDIR/d$s ]; then
> mkdir $PDIR/d$s
> fi
> lfs setstripe -c $s $PDIR/d$s
> lfs getstripe $PDIR/d$s
>
> rm -f $PDIR/d$s/bigfile
> dd if=/dev/zero of=$PDIR/d$s/bigfile bs=1024k count=1
> lfs getstripe $PDIR/d$s/bigfile
>
> echo "Checking d$s"
> mpiexec.hydra -np 2 ./check_mpi_striping $PDIR/d$s/bigfile
> done
>
> Here are the results:
>
> $ ./doit
> /lfs2/jetmgmt/Craig.Tierney/test/d1
> stripe_count: 1 stripe_size: 1048576 stripe_offset: -1
> /lfs2/jetmgmt/Craig.Tierney/test/d1/bigfile
> lmm_stripe_count: 1
> lmm_stripe_size: 1048576
> lmm_pattern: 1
> lmm_layout_gen: 0
> lmm_stripe_offset: 4
> obdidx objid objid group
> 4 15487258 0xec511a 0
>
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.00228375 s, 459 MB/s
> /lfs2/jetmgmt/Craig.Tierney/test/d1/bigfile
> lmm_stripe_count: 1
> lmm_stripe_size: 1048576
> lmm_pattern: 1
> lmm_layout_gen: 0
> lmm_stripe_offset: 7
> obdidx objid objid group
> 7 15421630 0xeb50be 0
>
> Checking d1
> Success: striping_unit=1048576 striping_factor=1
> /lfs2/jetmgmt/Craig.Tierney/test/d4
> stripe_count: 4 stripe_size: 1048576 stripe_offset: -1
> /lfs2/jetmgmt/Craig.Tierney/test/d4/bigfile
> lmm_stripe_count: 4
> lmm_stripe_size: 1048576
> lmm_pattern: 1
> lmm_layout_gen: 0
> lmm_stripe_offset: 17
> obdidx objid objid group
> 17 15361627 0xea665b 0
> 42 15439375 0xeb960f 0
> 0 15384104 0xeabe28 0
> 2 15522060 0xecd90c 0
>
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.00210304 s, 499 MB/s
> /lfs2/jetmgmt/Craig.Tierney/test/d4/bigfile
> lmm_stripe_count: 4
> lmm_stripe_size: 1048576
> lmm_pattern: 1
> lmm_layout_gen: 0
> lmm_stripe_offset: 12
> obdidx objid objid group
> 12 15345301 0xea2695 0
> 37 15646009 0xeebd39 0
> 41 15695216 0xef7d70 0
> 18 15500412 0xec847c 0
>
> Checking d4
> Success: striping_unit=1048576 striping_factor=4
>
>
> Craig
>
>
>
> On Tue, Sep 22, 2015 at 3:10 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
> Hi, Craig
>
> From these outputs, I think it is most likely due to MPI-IO fails
> to return the same file striping unit and factor values among all
> MPI processes. I guess only root process gets the correct values.
> Attached is a short MPI program to test this theory.
> Could you test it using at least 2 processes on Lustre?
>
> To compile:
> mpicc -o check_mpi_striping check_mpi_striping.c
> To run:
> mpiexec -n 2 check_mpi_striping
>
>
> Wei-keng
>
>
>
>
> On Sep 22, 2015, at 2:34 PM, Craig Tierney - NOAA Affiliate wrote:
>
> > Wei-keng,
> >
> > Here is the output from my run with PNETCDF_SAFE_MODE=1 on Lustre:
> >
> > [root at Jet:fe8 FLASH-IO]# mpiexec.hydra -env PNETCDF_SAFE_MODE 1 -np 4 ./flash_benchmark_io /lfs2/jetmgmt/Craig.Tierney/d1//flash_io_test_
> > Warning (inconsistent metadata): variable lrefine's begin (root=1048576, local=3072)
> > Warning (inconsistent metadata): variable nodetype's begin (root=2097152, local=4608)
> > Warning (inconsistent metadata): variable gid's begin (root=3145728, local=6144)
> > Warning (inconsistent metadata): variable coordinates's begin (root=4194304, local=25600)
> > Warning (inconsistent metadata): variable blocksize's begin (root=5242880, local=33792)
> > Warning (inconsistent metadata): variable bndbox's begin (root=6291456, local=41984)
> > Warning (inconsistent metadata): variable dens's begin (root=7340032, local=57856)
> > Warning (inconsistent metadata): variable velx's begin (root=18874368, local=10641920)
> > Warning (inconsistent metadata): variable lrefine's begin (root=1048576, local=3072)
> > Warning (inconsistent metadata): variable nodetype's begin (root=2097152, local=4608)
> > Warning (inconsistent metadata): variable gid's begin (root=3145728, local=6144)
> > Warning (inconsistent metadata): variable coordinates's begin (root=4194304, local=25600)
> > Warning (inconsistent metadata): variable blocksize's begin (root=5242880, local=33792)
> > Warning (inconsistent metadata): variable bndbox's begin (root=6291456, local=41984)
> > Warning (inconsistent metadata): variable dens's begin (root=7340032, local=57856)
> > Warning (inconsistent metadata): variable velx's begin (root=18874368, local=10641920)
> > Warning (inconsistent metadata): variable vely's begin (root=30408704, local=21225984)
> > Warning (inconsistent metadata): variable velz's begin (root=41943040, local=31810048)
> > Warning (inconsistent metadata): variable pres's begin (root=53477376, local=42394112)
> > Warning (inconsistent metadata): variable ener's begin (root=65011712, local=52978176)
> > Warning (inconsistent metadata): variable temp's begin (root=76546048, local=63562240)
> > Warning (inconsistent metadata): variable gamc's begin (root=88080384, local=74146304)
> > Warning (inconsistent metadata): variable game's begin (root=99614720, local=84730368)
> > Warning (inconsistent metadata): variable enuc's begin (root=111149056, local=95314432)
> > Warning (inconsistent metadata): variable gpot's begin (root=122683392, local=105898496)
> > Warning (inconsistent metadata): variable f1__'s begin (root=134217728, local=116482560)
> > Warning (inconsistent metadata): variable f2__'s begin (root=145752064, local=127066624)
> > Warning (inconsistent metadata): variable f3__'s begin (root=157286400, local=137650688)
> > Warning (inconsistent metadata): variable lrefine's begin (root=1048576, local=3072)
> > Warning (inconsistent metadata): variable nodetype's begin (root=2097152, local=4608)
> > Warning (inconsistent metadata): variable gid's begin (root=3145728, local=6144)
> > Warning (inconsistent metadata): variable coordinates's begin (root=4194304, local=25600)
> > Warning (inconsistent metadata): variable blocksize's begin (root=5242880, local=33792)
> > Warning (inconsistent metadata): variable bndbox's begin (root=6291456, local=41984)
> > Warning (inconsistent metadata): variable dens's begin (root=7340032, local=57856)
> > Warning (inconsistent metadata): variable velx's begin (root=18874368, local=10641920)
> > Warning (inconsistent metadata): variable vely's begin (root=30408704, local=21225984)
> > Warning (inconsistent metadata): variable velz's begin (root=41943040, local=31810048)
> > Warning (inconsistent metadata): variable pres's begin (root=53477376, local=42394112)
> > Warning (inconsistent metadata): variable ener's begin (root=65011712, local=52978176)
> > Warning (inconsistent metadata): variable temp's begin (root=76546048, local=63562240)
> > Warning (inconsistent metadata): variable gamc's begin (root=88080384, local=74146304)
> > Warning (inconsistent metadata): variable game's begin (root=99614720, local=84730368)
> > Warning (inconsistent metadata): variable enuc's begin (root=111149056, local=95314432)
> > Warning (inconsistent metadata): variable gpot's begin (root=122683392, local=105898496)
> > Warning (inconsistent metadata): variable f1__'s begin (root=134217728, local=116482560)
> > Warning (inconsistent metadata): variable f2__'s begin (root=145752064, local=127066624)
> > Warning (inconsistent metadata): variable f3__'s begin (root=157286400, local=137650688)
> > Warning (inconsistent metadata): variable f4__'s begin (root=168820736, local=148234752)
> > Warning (inconsistent metadata): variable f5__'s begin (root=180355072, local=158818816)
> > Warning (inconsistent metadata): variable f6__'s begin (root=191889408, local=169402880)
> > Warning (inconsistent metadata): variable vely's begin (root=30408704, local=21225984)
> > Warning (inconsistent metadata): variable velz's begin (root=41943040, local=31810048)
> > Warning (inconsistent metadata): variable pres's begin (root=53477376, local=42394112)
> > Warning (inconsistent metadata): variable ener's begin (root=65011712, local=52978176)
> > Warning (inconsistent metadata): variable temp's begin (root=76546048, local=63562240)
> > Warning (inconsistent metadata): variable gamc's begin (root=88080384, local=74146304)
> > Warning (inconsistent metadata): variable game's begin (root=99614720, local=84730368)
> > Warning (inconsistent metadata): variable enuc's begin (root=111149056, local=95314432)
> > Warning (inconsistent metadata): variable gpot's begin (root=122683392, local=105898496)
> > Warning (inconsistent metadata): variable f1__'s begin (root=134217728, local=116482560)
> > Warning (inconsistent metadata): variable f2__'s begin (root=145752064, local=127066624)
> > Warning (inconsistent metadata): variable f3__'s begin (root=157286400, local=137650688)
> > Warning (inconsistent metadata): variable f4__'s begin (root=168820736, local=148234752)
> > Warning (inconsistent metadata): variable f5__'s begin (root=180355072, local=158818816)
> > Warning (inconsistent metadata): variable f6__'s begin (root=191889408, local=169402880)
> > Warning (inconsistent metadata): variable f7__'s begin (root=203423744, local=179986944)
> > Warning (inconsistent metadata): variable f8__'s begin (root=214958080, local=190571008)
> > Warning (inconsistent metadata): variable f9__'s begin (root=226492416, local=201155072)
> > Warning (inconsistent metadata): variable f10_'s begin (root=238026752, local=211739136)
> > Warning (inconsistent metadata): variable f11_'s begin (root=249561088, local=222323200)
> > Warning (inconsistent metadata): variable f12_'s begin (root=261095424, local=232907264)
> > Warning (inconsistent metadata): variable f13_'s begin (root=272629760, local=243491328)
> > Warning (inconsistent metadata): variable f4__'s begin (root=168820736, local=148234752)
> > Warning (inconsistent metadata): variable f5__'s begin (root=180355072, local=158818816)
> > Warning (inconsistent metadata): variable f6__'s begin (root=191889408, local=169402880)
> > Warning (inconsistent metadata): variable f7__'s begin (root=203423744, local=179986944)
> > Warning (inconsistent metadata): variable f8__'s begin (root=214958080, local=190571008)
> > Warning (inconsistent metadata): variable f9__'s begin (root=226492416, local=201155072)
> > Warning (inconsistent metadata): variable f10_'s begin (root=238026752, local=211739136)
> > Warning (inconsistent metadata): variable f11_'s begin (root=249561088, local=222323200)
> > Warning (inconsistent metadata): variable f12_'s begin (root=261095424, local=232907264)
> > Warning (inconsistent metadata): variable f13_'s begin (root=272629760, local=243491328)
> > Warning (inconsistent metadata): variable f7__'s begin (root=203423744, local=179986944)
> > Warning (inconsistent metadata): variable f8__'s begin (root=214958080, local=190571008)
> > Warning (inconsistent metadata): variable f9__'s begin (root=226492416, local=201155072)
> > Warning (inconsistent metadata): variable f10_'s begin (root=238026752, local=211739136)
> > Warning (inconsistent metadata): variable f11_'s begin (root=249561088, local=222323200)
> > Warning (inconsistent metadata): variable f12_'s begin (root=261095424, local=232907264)
> > Warning (inconsistent metadata): variable f13_'s begin (root=272629760, local=243491328)
> > Here: -250
> > Here: -262
> > Here: -262
> > Here: -262
> > nfmpi_enddefFile header is inconsistent among processes
> > nfmpi_enddef
> > (Internal error) beginning file offset of this variable is inconsistent among p
> > r
> > nfmpi_enddef
> > (Internal error) beginning file offset of this variable is inconsistent among p
> > r
> > nfmpi_enddef
> > (Internal error) beginning file offset of this variable is inconsistent among p
> > r
> > [cli_1]: aborting job:
> > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
> > [cli_0]: [cli_2]: aborting job:
> > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
> > [cli_3]: aborting job:
> > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
> > aborting job:
> > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
> >
> > Craig
> >
> > On Mon, Sep 21, 2015 at 1:21 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
> >
> > It is strange that the test failed for Lustre.
> >
> > The error message says some variables defined across MPI processes are not consistent.
> > Could you run this benchmark with safe mode on? by setting the environment variable
> > PNETCDF_SAFE_MODE to 1 before the run. This will print more error messages, such as
> > which variables are inconsistent and at what offsets.
> >
> >
> > Wei-keng
> >
> > On Sep 21, 2015, at 1:31 PM, Craig Tierney - NOAA Affiliate wrote:
> >
> > > Rob and Wei-keng,
> > >
> > > Thanks for you help on this problem. Rob - The patch seems to work. I had to hand apply it but now the pnetcdf tests (mostly) complete successfully. The FLASH-IO benchmark is failing when Lustre is used. It completes successfully when Panasas is used. The error code that is returned by nfmpi_enddef is -262. The description for this error is:
> > >
> > > #define NC_EMULTIDEFINE_VAR_BEGIN (-262) /**< inconsistent variable file begin offset (internal use) */
> > >
> > > [root at Jet:fe7 FLASH-IO]# mpiexec.hydra -n 4 ./flash_benchmark_io /pan2/jetmgmt/Craig.Tierney/pan_flash_io_test_
> > > Here: 0
> > > Here: 0
> > > Here: 0
> > > Here: 0
> > > number of guards : 4
> > > number of blocks : 80
> > > number of variables : 24
> > > checkpoint time : 12.74 sec
> > > max header : 0.88 sec
> > > max unknown : 11.83 sec
> > > max close : 0.53 sec
> > > I/O amount : 242.30 MiB
> > > plot no corner : 2.38 sec
> > > max header : 0.59 sec
> > > max unknown : 1.78 sec
> > > max close : 0.22 sec
> > > I/O amount : 20.22 MiB
> > > plot corner : 2.52 sec
> > > max header : 0.81 sec
> > > max unknown : 1.51 sec
> > > max close : 0.96 sec
> > > I/O amount : 24.25 MiB
> > > -------------------------------------------------------
> > > File base name : /pan2/jetmgmt/Craig.Tierney/pan_flash_io_test_
> > > file striping count : 0
> > > file striping size : 301346992 bytes
> > > Total I/O amount : 286.78 MiB
> > > -------------------------------------------------------
> > > nproc array size exec (sec) bandwidth (MiB/s)
> > > 4 16 x 16 x 16 17.64 16.26
> > >
> > >
> > > [root at Jet:fe7 FLASH-IO]# mpiexec.hydra -n 4 ./flash_benchmark_io /lfs2/jetmgmt/Craig.Tierney/lfs_flash_io_test_
> > > Here: -262
> > > Here: -262
> > > Here: -262
> > > nfmpi_enddef
> > > (Internal error) beginning file offset of this variable is inconsistent among p
> > > r
> > > nfmpi_enddef
> > > (Internal error) beginning file offset of this variable is inconsistent among p
> > > r
> > > nfmpi_enddef
> > > (Internal error) beginning file offset of this variable is inconsistent among p
> > > r
> > > Here: 0
> > > [cli_1]: aborting job:
> > > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
> > > [cli_3]: [cli_2]: aborting job:
> > > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
> > > aborting job:
> > > application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
> > >
> > > ===================================================================================
> > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > > = PID 16702 RUNNING AT fe7
> > > = EXIT CODE: 255
> > > = CLEANING UP REMAINING PROCESSES
> > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> > > ===================================================================================
> > >
> > > Thanks,
> > > Craig
> > >
> > >
> > > On Mon, Sep 21, 2015 at 8:30 AM, Rob Latham <robl at mcs.anl.gov> wrote:
> > >
> > >
> > > On 09/20/2015 03:44 PM, Craig Tierney - NOAA Affiliate wrote:
> > > Wei-keng,
> > >
> > > I tried your test code on a different system, and I found it worked with
> > > Intel+mvapich2 (2.1rc1). That system was using Panasas and I was
> > > testing on Lustre. I then tried Panasas on the original machine
> > > (supports both Panasas and Lustre) and I got the correct behavior.
> > >
> > > So the problem somehow related to Lustre. We are using the 2.5.37.ddn
> > > client. Unless you have an obvious answer, I will open this with DDN
> > > tomorrow.
> > >
> > >
> > > Ah, bet I know why this is!
> > >
> > > the Lustre driver and (some versions of the) Panasas driver set their fs-specific hints by opening the file, setting some ioctls, then continuing on without deleting the file.
> > >
> > > In the common case, when we expect the file to show up, no one notices or cares, but in MPI_MODE_EXCL or some other restrictive flags, the file gets created when we did not expect it to -- and that's part of the reason this bug lived on so long.
> > >
> > > I fixed this by moving file manipulations out of the hint parsing path and into the open path (after we check permissions and flags)
> > >
> > > Relevant commit: https://trac.mpich.org/projects/mpich/changeset/92f1c69f0de87f9
> > >
> > > See more details from Darshan, OpenMPI, and MPICH here:
> > > - https://trac.mpich.org/projects/mpich/ticket/2261
> > > - https://github.com/open-mpi/ompi/issues/158
> > > - http://lists.mcs.anl.gov/pipermail/darshan-users/2015-February/000256.html
> > >
> > > ==rob
> > >
> > >
> > > Thanks,
> > > Craig
> > >
> > > On Sun, Sep 20, 2015 at 2:36 PM, Craig Tierney - NOAA Affiliate
> > > <craig.tierney at noaa.gov <mailto:craig.tierney at noaa.gov>> wrote:
> > >
> > > Wei-keng,
> > >
> > > Thanks for the test case. Here is what I get using a set of
> > > compilers and MPI stacks. I was expecting that mvapich2 1.8 and 2.1
> > > would behave differently.
> > >
> > > What versions of MPI do you test internally?
> > >
> > > Craig
> > >
> > > Testing intel+impi
> > >
> > > Currently Loaded Modules:
> > > 1) newdefaults 2) intel/15.0.3.187 <http://15.0.3.187> 3)
> > > impi/5.1.1.109 <http://5.1.1.109>
> > >
> > > Error at line 22: File does not exist, error stack:
> > > ADIOI_NFS_OPEN(69): File /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> > > <http://tooth-fairy.nc> does not exist
> > > Testing intel+mvapich2 2.1
> > >
> > > Currently Loaded Modules:
> > > 1) newdefaults 2) intel/15.0.3.187 <http://15.0.3.187> 3)
> > > mvapich2/2.1
> > >
> > > file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> > > <http://tooth-fairy.nc>
> > > Testing intel+mvapich2 1.8
> > >
> > > Currently Loaded Modules:
> > > 1) newdefaults 2) intel/15.0.3.187 <http://15.0.3.187> 3)
> > > mvapich2/1.8
> > >
> > > file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> > > <http://tooth-fairy.nc>
> > > Testing pgi+mvapich2 2.1
> > >
> > > Currently Loaded Modules:
> > > 1) newdefaults 2) pgi/15.3 3) mvapich2/2.1
> > >
> > > file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> > > <http://tooth-fairy.nc>
> > > Testing pgi+mvapich2 1.8
> > >
> > > Currently Loaded Modules:
> > > 1) newdefaults 2) pgi/15.3 3) mvapich2/1.8
> > >
> > > file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> > > <http://tooth-fairy.nc>
> > >
> > > Craig
> > >
> > > On Sun, Sep 20, 2015 at 1:43 PM, Wei-keng Liao
> > > <wkliao at eecs.northwestern.edu <mailto:wkliao at eecs.northwestern.edu>>
> > > wrote:
> > >
> > > In that case, it is likely mvapich does not perform correctly.
> > >
> > > In PnetCDF, when NC_NOWRITE is used in a call to ncmpi_open,
> > > PnetCDF calls a MPI_File_open with the open flag set to
> > > MPI_MODE_RDONLY. See
> > > http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/tags/v1-6-1/src/lib/mpincio.c#L322
> > >
> > > Maybe test this with a simple MPI-IO program below.
> > > It prints error messages like
> > > Error at line 15: File does not exist, error stack:
> > > ADIOI_UFS_OPEN(69): File tooth-fairy.nc
> > > <http://tooth-fairy.nc> does not exist
> > >
> > > But, no file should be created.
> > >
> > >
> > > #include <stdio.h>
> > > #include <unistd.h> /* unlink() */
> > > #include <mpi.h>
> > >
> > > int main(int argc, char **argv) {
> > > int err;
> > > MPI_File fh;
> > >
> > > MPI_Init(&argc, &argv);
> > >
> > > /* delete "tooth-fairy.nc <http://tooth-fairy.nc>" and
> > > ignore the error */
> > > unlink("tooth-fairy.nc <http://tooth-fairy.nc>");
> > >
> > > err = MPI_File_open(MPI_COMM_WORLD, "tooth-fairy.nc
> > > <http://tooth-fairy.nc>", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
> > > if (err != MPI_SUCCESS) {
> > > int errorStringLen;
> > > char errorString[MPI_MAX_ERROR_STRING];
> > > MPI_Error_string(err, errorString, &errorStringLen);
> > > printf("Error at line %d: %s\n",__LINE__, errorString);
> > > }
> > > else
> > > MPI_File_close(&fh);
> > >
> > > MPI_Finalize();
> > > return 0;
> > > }
> > >
> > >
> > > Wei-keng
> > >
> > > On Sep 20, 2015, at 1:51 PM, Craig Tierney - NOAA Affiliate wrote:
> > >
> > > > Wei-keng,
> > > >
> > > > I always run distclean before I try to build the code. The
> > > first test failing is nc_test. The problem seems to be in this
> > > test:
> > > >
> > > > err = ncmpi_open(comm, "tooth-fairy.nc
> > > <http://tooth-fairy.nc>", NC_NOWRITE, info, &ncid);/* should fail */
> > > > IF (err == NC_NOERR)
> > > > error("ncmpi_open of nonexistent file should have
> > > failed");
> > > > IF (err != NC_ENOENT)
> > > > error("ncmpi_open of nonexistent file should have
> > > returned NC_ENOENT");
> > > > else {
> > > > /* printf("Expected error message complaining: \"File
> > > tooth-fairy.nc <http://tooth-fairy.nc> does not exist\"\n"); */
> > > > nok++;
> > > > }
> > > >
> > > > A zero length tooth-fairy.nc <http://tooth-fairy.nc> file is
> > > being created, and I don't think that is supposed to happen.
> > > That would mean that the mode NC_NOWRITE is not being honored by
> > > MPI_IO. I will look at this more tomorrow and try to craft a
> > > short example.
> > > >
> > > > Craig
> > > >
> > > > On Sun, Sep 20, 2015 at 10:23 AM, Wei-keng Liao
> > > <wkliao at eecs.northwestern.edu
> > > <mailto:wkliao at eecs.northwestern.edu>> wrote:
> > > > Hi, Craig
> > > >
> > > > Your config.log looks fine to me.
> > > > Some of your error messages are supposed to report errors of
> > > opening
> > > > a non-existing file, but report a different error code,
> > > meaning the
> > > > file does exist. I suspect it may be because of residue files.
> > > >
> > > > Could you do a clean rebuild with the following commands?
> > > > % make -s distclean
> > > > % ./configure --prefix=/apps/pnetcdf/1.6.1-intel-mvapich2
> > > > % make -s -j8
> > > > % make -s check
> > > >
> > > > If the problem persists, then it might be because mvapich.
> > > >
> > > > Wei-keng
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Rob Latham
> > > Mathematics and Computer Science Division
> > > Argonne National Lab, IL USA
> > >
> >
> >
>
>
>
More information about the parallel-netcdf
mailing list