Unable to pass all the tests with pnetcdf 1.6.1, Intel 15.0.3.048 and Mvapich2 2.1

Wei-keng Liao wkliao at eecs.northwestern.edu
Sun Sep 20 17:17:05 CDT 2015


For loaded module 3) impi/5.1.1.109

> ADIOI_NFS_OPEN(69): File /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc does not exist
This error message says your MPI library thinks /lfs3/jetmgmt/Craig.Tierney is an NFS, not Lustre.
Either /lfs3 is not a Lustre or the configuration of your MPI compiler is not made right. 
Please check your MPI configure setting to see if lustre appears in --with-file-system option.

For loaded module 3) mvapich2/*,  what exactly the error messages are?
Does it start with ADIOI_LUSTRE_OPEN?

Wei-keng

On Sep 20, 2015, at 3:44 PM, Craig Tierney - NOAA Affiliate wrote:

> Wei-keng,
> 
> I tried your test code on a different system, and I found it worked with Intel+mvapich2 (2.1rc1).  That system was using Panasas and I was testing on Lustre.  I then tried Panasas on the original machine (supports both Panasas and Lustre) and I got the correct behavior.  
> 
> So the problem somehow related to Lustre.  We are using the 2.5.37.ddn client.   Unless you have an obvious answer, I will open this with DDN tomorrow.
> 
> Thanks,
> Craig
> 
> On Sun, Sep 20, 2015 at 2:36 PM, Craig Tierney - NOAA Affiliate <craig.tierney at noaa.gov> wrote:
> Wei-keng,
> 
> Thanks for the test case.  Here is what I get using a set of compilers and MPI stacks.  I was expecting that mvapich2 1.8 and 2.1 would behave differently.  
> 
> What versions of MPI do you test internally? 
> 
> Craig
> 
> Testing intel+impi
> 
> Currently Loaded Modules:
>   1) newdefaults   2) intel/15.0.3.187   3) impi/5.1.1.109
> 
> Error at line 22: File does not exist, error stack:
> ADIOI_NFS_OPEN(69): File /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc does not exist
> Testing intel+mvapich2 2.1
> 
> Currently Loaded Modules:
>   1) newdefaults   2) intel/15.0.3.187   3) mvapich2/2.1
> 
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> Testing intel+mvapich2 1.8
> 
> Currently Loaded Modules:
>   1) newdefaults   2) intel/15.0.3.187   3) mvapich2/1.8
> 
> file was  opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> Testing pgi+mvapich2 2.1
> 
> Currently Loaded Modules:
>   1) newdefaults   2) pgi/15.3   3) mvapich2/2.1
> 
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> Testing pgi+mvapich2 1.8
> 
> Currently Loaded Modules:
>   1) newdefaults   2) pgi/15.3   3) mvapich2/1.8
> 
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> 
> Craig
> 
> On Sun, Sep 20, 2015 at 1:43 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
> In that case, it is likely mvapich does not perform correctly.
> 
> In PnetCDF, when NC_NOWRITE is used in a call to ncmpi_open,
> PnetCDF calls a MPI_File_open with the open flag set to MPI_MODE_RDONLY. See
> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/tags/v1-6-1/src/lib/mpincio.c#L322
> 
> Maybe test this with a simple MPI-IO program below.
> It prints error messages like
>     Error at line 15: File does not exist, error stack:
>     ADIOI_UFS_OPEN(69): File tooth-fairy.nc does not exist
> 
> But, no file should be created.
> 
> 
> #include <stdio.h>
> #include <unistd.h> /* unlink() */
> #include <mpi.h>
> 
> int main(int argc, char **argv) {
>     int err;
>     MPI_File fh;
> 
>     MPI_Init(&argc, &argv);
> 
>     /* delete "tooth-fairy.nc" and ignore the error */
>     unlink("tooth-fairy.nc");
> 
>     err = MPI_File_open(MPI_COMM_WORLD, "tooth-fairy.nc", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
>     if (err != MPI_SUCCESS) {
>         int errorStringLen;
>         char errorString[MPI_MAX_ERROR_STRING];
>         MPI_Error_string(err, errorString, &errorStringLen);
>         printf("Error at line %d: %s\n",__LINE__, errorString);
>     }
>     else
>         MPI_File_close(&fh);
> 
>     MPI_Finalize();
>     return 0;
> }
> 
> 
> Wei-keng
> 
> On Sep 20, 2015, at 1:51 PM, Craig Tierney - NOAA Affiliate wrote:
> 
> > Wei-keng,
> >
> > I always run distclean before I try to build the code.  The first test failing is nc_test.  The problem seems to be in this test:
> >
> >    err = ncmpi_open(comm, "tooth-fairy.nc", NC_NOWRITE, info, &ncid);/* should fail */
> >     IF (err == NC_NOERR)
> >         error("ncmpi_open of nonexistent file should have failed");
> >     IF (err != NC_ENOENT)
> >         error("ncmpi_open of nonexistent file should have returned NC_ENOENT");
> >     else {
> >         /* printf("Expected error message complaining: \"File tooth-fairy.nc does not exist\"\n"); */
> >         nok++;
> >     }
> >
> > A zero length tooth-fairy.nc file is being created, and I don't think that is supposed to happen.  That would mean that the mode NC_NOWRITE is not being honored by MPI_IO.  I will look at this more tomorrow and try to craft a short example.
> >
> > Craig
> >
> > On Sun, Sep 20, 2015 at 10:23 AM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
> > Hi, Craig
> >
> > Your config.log looks fine to me.
> > Some of your error messages are supposed to report errors of opening
> > a non-existing file, but report a different error code, meaning the
> > file does exist. I suspect it may be because of residue files.
> >
> > Could you do a clean rebuild with the following commands?
> >     % make -s distclean
> >     % ./configure --prefix=/apps/pnetcdf/1.6.1-intel-mvapich2
> >     % make -s -j8
> >     % make -s check
> >
> > If the problem persists, then it might be because mvapich.
> >
> > Wei-keng
> >
> 
> 
> 



More information about the parallel-netcdf mailing list