Unable to pass all the tests with pnetcdf 1.6.1, Intel 15.0.3.048 and Mvapich2 2.1

Craig Tierney - NOAA Affiliate craig.tierney at noaa.gov
Sun Sep 20 15:44:23 CDT 2015


Wei-keng,

I tried your test code on a different system, and I found it worked with
Intel+mvapich2 (2.1rc1).  That system was using Panasas and I was testing
on Lustre.  I then tried Panasas on the original machine (supports both
Panasas and Lustre) and I got the correct behavior.

So the problem somehow related to Lustre.  We are using the 2.5.37.ddn
client.   Unless you have an obvious answer, I will open this with DDN
tomorrow.

Thanks,
Craig

On Sun, Sep 20, 2015 at 2:36 PM, Craig Tierney - NOAA Affiliate <
craig.tierney at noaa.gov> wrote:

> Wei-keng,
>
> Thanks for the test case.  Here is what I get using a set of compilers and
> MPI stacks.  I was expecting that mvapich2 1.8 and 2.1 would behave
> differently.
>
> What versions of MPI do you test internally?
>
> Craig
>
> Testing intel+impi
>
> Currently Loaded Modules:
>   1) newdefaults   2) intel/15.0.3.187   3) impi/5.1.1.109
>
> Error at line 22: File does not exist, error stack:
> ADIOI_NFS_OPEN(69): File /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc does
> not exist
> Testing intel+mvapich2 2.1
>
> Currently Loaded Modules:
>   1) newdefaults   2) intel/15.0.3.187   3) mvapich2/2.1
>
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> Testing intel+mvapich2 1.8
>
> Currently Loaded Modules:
>   1) newdefaults   2) intel/15.0.3.187   3) mvapich2/1.8
>
> file was  opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> Testing pgi+mvapich2 2.1
>
> Currently Loaded Modules:
>   1) newdefaults   2) pgi/15.3   3) mvapich2/2.1
>
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> Testing pgi+mvapich2 1.8
>
> Currently Loaded Modules:
>   1) newdefaults   2) pgi/15.3   3) mvapich2/1.8
>
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
>
> Craig
>
> On Sun, Sep 20, 2015 at 1:43 PM, Wei-keng Liao <
> wkliao at eecs.northwestern.edu> wrote:
>
>> In that case, it is likely mvapich does not perform correctly.
>>
>> In PnetCDF, when NC_NOWRITE is used in a call to ncmpi_open,
>> PnetCDF calls a MPI_File_open with the open flag set to MPI_MODE_RDONLY.
>> See
>>
>> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/tags/v1-6-1/src/lib/mpincio.c#L322
>>
>> Maybe test this with a simple MPI-IO program below.
>> It prints error messages like
>>     Error at line 15: File does not exist, error stack:
>>     ADIOI_UFS_OPEN(69): File tooth-fairy.nc does not exist
>>
>> But, no file should be created.
>>
>>
>> #include <stdio.h>
>> #include <unistd.h> /* unlink() */
>> #include <mpi.h>
>>
>> int main(int argc, char **argv) {
>>     int err;
>>     MPI_File fh;
>>
>>     MPI_Init(&argc, &argv);
>>
>>     /* delete "tooth-fairy.nc" and ignore the error */
>>     unlink("tooth-fairy.nc");
>>
>>     err = MPI_File_open(MPI_COMM_WORLD, "tooth-fairy.nc",
>> MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
>>     if (err != MPI_SUCCESS) {
>>         int errorStringLen;
>>         char errorString[MPI_MAX_ERROR_STRING];
>>         MPI_Error_string(err, errorString, &errorStringLen);
>>         printf("Error at line %d: %s\n",__LINE__, errorString);
>>     }
>>     else
>>         MPI_File_close(&fh);
>>
>>     MPI_Finalize();
>>     return 0;
>> }
>>
>>
>> Wei-keng
>>
>> On Sep 20, 2015, at 1:51 PM, Craig Tierney - NOAA Affiliate wrote:
>>
>> > Wei-keng,
>> >
>> > I always run distclean before I try to build the code.  The first test
>> failing is nc_test.  The problem seems to be in this test:
>> >
>> >    err = ncmpi_open(comm, "tooth-fairy.nc", NC_NOWRITE, info,
>> &ncid);/* should fail */
>> >     IF (err == NC_NOERR)
>> >         error("ncmpi_open of nonexistent file should have failed");
>> >     IF (err != NC_ENOENT)
>> >         error("ncmpi_open of nonexistent file should have returned
>> NC_ENOENT");
>> >     else {
>> >         /* printf("Expected error message complaining: \"File
>> tooth-fairy.nc does not exist\"\n"); */
>> >         nok++;
>> >     }
>> >
>> > A zero length tooth-fairy.nc file is being created, and I don't think
>> that is supposed to happen.  That would mean that the mode NC_NOWRITE is
>> not being honored by MPI_IO.  I will look at this more tomorrow and try to
>> craft a short example.
>> >
>> > Craig
>> >
>> > On Sun, Sep 20, 2015 at 10:23 AM, Wei-keng Liao <
>> wkliao at eecs.northwestern.edu> wrote:
>> > Hi, Craig
>> >
>> > Your config.log looks fine to me.
>> > Some of your error messages are supposed to report errors of opening
>> > a non-existing file, but report a different error code, meaning the
>> > file does exist. I suspect it may be because of residue files.
>> >
>> > Could you do a clean rebuild with the following commands?
>> >     % make -s distclean
>> >     % ./configure --prefix=/apps/pnetcdf/1.6.1-intel-mvapich2
>> >     % make -s -j8
>> >     % make -s check
>> >
>> > If the problem persists, then it might be because mvapich.
>> >
>> > Wei-keng
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20150920/f81b6b1f/attachment-0001.html>


More information about the parallel-netcdf mailing list