Unable to pass all the tests with pnetcdf 1.6.1, Intel 15.0.3.048 and Mvapich2 2.1
Craig Tierney - NOAA Affiliate
craig.tierney at noaa.gov
Sun Sep 20 15:44:23 CDT 2015
Wei-keng,
I tried your test code on a different system, and I found it worked with
Intel+mvapich2 (2.1rc1). That system was using Panasas and I was testing
on Lustre. I then tried Panasas on the original machine (supports both
Panasas and Lustre) and I got the correct behavior.
So the problem somehow related to Lustre. We are using the 2.5.37.ddn
client. Unless you have an obvious answer, I will open this with DDN
tomorrow.
Thanks,
Craig
On Sun, Sep 20, 2015 at 2:36 PM, Craig Tierney - NOAA Affiliate <
craig.tierney at noaa.gov> wrote:
> Wei-keng,
>
> Thanks for the test case. Here is what I get using a set of compilers and
> MPI stacks. I was expecting that mvapich2 1.8 and 2.1 would behave
> differently.
>
> What versions of MPI do you test internally?
>
> Craig
>
> Testing intel+impi
>
> Currently Loaded Modules:
> 1) newdefaults 2) intel/15.0.3.187 3) impi/5.1.1.109
>
> Error at line 22: File does not exist, error stack:
> ADIOI_NFS_OPEN(69): File /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc does
> not exist
> Testing intel+mvapich2 2.1
>
> Currently Loaded Modules:
> 1) newdefaults 2) intel/15.0.3.187 3) mvapich2/2.1
>
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> Testing intel+mvapich2 1.8
>
> Currently Loaded Modules:
> 1) newdefaults 2) intel/15.0.3.187 3) mvapich2/1.8
>
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> Testing pgi+mvapich2 2.1
>
> Currently Loaded Modules:
> 1) newdefaults 2) pgi/15.3 3) mvapich2/2.1
>
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
> Testing pgi+mvapich2 1.8
>
> Currently Loaded Modules:
> 1) newdefaults 2) pgi/15.3 3) mvapich2/1.8
>
> file was opened: /lfs3/jetmgmt/Craig.Tierney/tooth-fairy.nc
>
> Craig
>
> On Sun, Sep 20, 2015 at 1:43 PM, Wei-keng Liao <
> wkliao at eecs.northwestern.edu> wrote:
>
>> In that case, it is likely mvapich does not perform correctly.
>>
>> In PnetCDF, when NC_NOWRITE is used in a call to ncmpi_open,
>> PnetCDF calls a MPI_File_open with the open flag set to MPI_MODE_RDONLY.
>> See
>>
>> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/tags/v1-6-1/src/lib/mpincio.c#L322
>>
>> Maybe test this with a simple MPI-IO program below.
>> It prints error messages like
>> Error at line 15: File does not exist, error stack:
>> ADIOI_UFS_OPEN(69): File tooth-fairy.nc does not exist
>>
>> But, no file should be created.
>>
>>
>> #include <stdio.h>
>> #include <unistd.h> /* unlink() */
>> #include <mpi.h>
>>
>> int main(int argc, char **argv) {
>> int err;
>> MPI_File fh;
>>
>> MPI_Init(&argc, &argv);
>>
>> /* delete "tooth-fairy.nc" and ignore the error */
>> unlink("tooth-fairy.nc");
>>
>> err = MPI_File_open(MPI_COMM_WORLD, "tooth-fairy.nc",
>> MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
>> if (err != MPI_SUCCESS) {
>> int errorStringLen;
>> char errorString[MPI_MAX_ERROR_STRING];
>> MPI_Error_string(err, errorString, &errorStringLen);
>> printf("Error at line %d: %s\n",__LINE__, errorString);
>> }
>> else
>> MPI_File_close(&fh);
>>
>> MPI_Finalize();
>> return 0;
>> }
>>
>>
>> Wei-keng
>>
>> On Sep 20, 2015, at 1:51 PM, Craig Tierney - NOAA Affiliate wrote:
>>
>> > Wei-keng,
>> >
>> > I always run distclean before I try to build the code. The first test
>> failing is nc_test. The problem seems to be in this test:
>> >
>> > err = ncmpi_open(comm, "tooth-fairy.nc", NC_NOWRITE, info,
>> &ncid);/* should fail */
>> > IF (err == NC_NOERR)
>> > error("ncmpi_open of nonexistent file should have failed");
>> > IF (err != NC_ENOENT)
>> > error("ncmpi_open of nonexistent file should have returned
>> NC_ENOENT");
>> > else {
>> > /* printf("Expected error message complaining: \"File
>> tooth-fairy.nc does not exist\"\n"); */
>> > nok++;
>> > }
>> >
>> > A zero length tooth-fairy.nc file is being created, and I don't think
>> that is supposed to happen. That would mean that the mode NC_NOWRITE is
>> not being honored by MPI_IO. I will look at this more tomorrow and try to
>> craft a short example.
>> >
>> > Craig
>> >
>> > On Sun, Sep 20, 2015 at 10:23 AM, Wei-keng Liao <
>> wkliao at eecs.northwestern.edu> wrote:
>> > Hi, Craig
>> >
>> > Your config.log looks fine to me.
>> > Some of your error messages are supposed to report errors of opening
>> > a non-existing file, but report a different error code, meaning the
>> > file does exist. I suspect it may be because of residue files.
>> >
>> > Could you do a clean rebuild with the following commands?
>> > % make -s distclean
>> > % ./configure --prefix=/apps/pnetcdf/1.6.1-intel-mvapich2
>> > % make -s -j8
>> > % make -s check
>> >
>> > If the problem persists, then it might be because mvapich.
>> >
>> > Wei-keng
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20150920/f81b6b1f/attachment-0001.html>
More information about the parallel-netcdf
mailing list