nfmpi_iput_vara_double + nfmpi_wait_all
Wei-keng Liao
wkliao at ece.northwestern.edu
Thu Aug 26 10:09:02 CDT 2010
Hi, Max,
In that case, please do let use know if you find it is due to the compiler.
We would like to add a note in the install/readme file. Thanks.
Wei-keng
On Aug 25, 2010, at 5:29 PM, Maxwell Kelley wrote:
>
> Hi Wei-keng,
>
> Thanks for running the additional tests on your end...
>
> I just recompiled with a newer release of Intel MPI (in the 4.0 series) and the test succeeded (although I didn't rebuild pnetcdf against the newer MPI). I will investigate whether there were any reports of I/O bugs in the Intel 3.x series and keep you posted.
>
> Thanks again,
> -Max
>
> On Wed, 25 Aug 2010, Wei-keng Liao wrote:
>
>> Hi, Max,
>>
>> I tried the followings, but still cannot find any mismatch.
>> My machine is an 8-node dual-core Linux cluster, running mpich2-1.2.p1, using pvfs2.
>>
>> Codes changes:
>> 1. I changed the buffer initialization to
>> do n=1,ntm
>> do l=1,lm
>> do j=1,jm_local
>> jj = j + j_strt - 1
>> do i=1,im
>> arr2d(i,j,n) = 1000*i + 10*jj + n
>> arr3d(i,j,l,n) = 100000*i + 1000*jj + 10*l + n
>> enddo
>> enddo
>> enddo
>> enddo
>>
>> 2. I also use OR(NF_CLOBBER,NF_64BIT_OFFSET) in nfmpi_create()
>>
>> 3. I kept the variable define/creation order the same.
>> I.e. when lager=1, a2d_* first and then a3d_*
>> when layer=2, a2d_* and a3d_* interleave
>>
>> Run cases: (1, 4, 8, and 16 processes, im=128,jm=64,lm=64)
>> 1. nonblocking I/O, a2d_* first and then a3d_*
>> 2. nonblocking I/O, a3d_* first and then a2d_*
>> 3. blocking I/O for layer=1 and nonblocking for layer=2 and repeat 1 and 2
>>
>> I compare all variable contents between two files and they are the same.
>>
>> Can your also try using NC_SHARE open mode in nfmpi_create()? It uses a stronger data consistency.
>> If this still does not solve your problem, I am open to suggestions. Maybe a temp access to your machine?
>>
>> Wei-keng
>>
>> On Aug 25, 2010, at 10:32 AM, Maxwell Kelley wrote:
>>
>>>
>>> Hi Wei-keng,
>>>
>>> Thank you for running the test on your system.
>>>
>>> None of the return codes corresponds to any error. I've been testing on up to 16 processes, but the results do not vary with the # of processes.
>>>
>>> I've replaced the random contents of the arrays with some structure to see what might be happening:
>>>
>>> a2d_n (i,j,n) = 1000*i + 10*j + n
>>> a3d_n (i,j,l,n) = 100000*i + 1000*j + 10*l + n
>>>
>>> For the case I'm running, the output array a2d_02 is wrong in noncontiguous.nc. Its contents are equal to a3d_01 (l == 1). If I write the a3d* arrays one at a time rather than using nfmpi_iput_vara_double, but continue to write the a2d* using nfmpi_iput_vara_double, all results are correct.
>>>
>>> If I write the a3d* arrays before the a2d* arrays, but do not change the netcdf ordering in the file, the a3d* arrays seem to be corrupted during the a2d* write. The a3d* are mostly filled with zeros.
>>>
>>> System details:
>>>
>>> uname -a: Linux borgg125 2.6.16.60-0.42.5-smp #1 SMP Mon Aug 24 09:41:41 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Fortran compiler: ifort 1.1.017 via mpif90
>>> MPI flavor: Intel MPI 3.2.011
>>>
>>> -Max
>>>
>>> On Wed, 25 Aug 2010, Wei-keng Liao wrote:
>>>
>>>> Hi, Max,
>>>>
>>>> I tested your testit.f using one and 4 MPI processes, but cannot reproduce a mismatch.
>>>> The contents of all arrays are the same between contiguous.nc and noncontiguous.nc.
>>>>
>>>> Can you tell me more about how you ran the code and your system info (eg. # of processes, file system, machine, mpich version)? Can you provide more info about this "sometimes" incorrect result? (eg. what are the mismatched variables?)
>>>>
>>>> Is there errors (other than NF_NOERR) returned from any nfmpi_iput_vara_double or nfmpi_wait_all call? You might want to check the "statuses", in addition to "status", from nfmpi_wait_all to see if there is any individual nonblocking request failed.
>>>>
>>>> Wei-keng
>>>>
>>>> On Aug 24, 2010, at 4:40 PM, Maxwell Kelley wrote:
>>>>
>>>>>
>>>>> Hi Wei-keng,
>>>>>
>>>>> Attached is my crude fortran test code executing the cases described below. For smaller problem sizes, the noncontiguous-array case works ok.
>>>>>
>>>>> Here is the structure of an example file for the "contiguous" case, which seems to work properly. I combined the arrays (a2d_01 - a2d_08) into a first write operation, and then combined the arrays (a3d_01 - a3d_08) into a second write operation:
>>>>>
>>>>> netcdf contiguous {
>>>>> dimensions:
>>>>> im = 128 ;
>>>>> jm = 64 ;
>>>>> lm = 64 ;
>>>>> variables:
>>>>> double a2d_01(jm, im) ;
>>>>> double a2d_02(jm, im) ;
>>>>> double a2d_03(jm, im) ;
>>>>> double a2d_04(jm, im) ;
>>>>> double a2d_05(jm, im) ;
>>>>> double a2d_06(jm, im) ;
>>>>> double a2d_07(jm, im) ;
>>>>> double a2d_08(jm, im) ;
>>>>> double a3d_01(lm, jm, im) ;
>>>>> double a3d_02(lm, jm, im) ;
>>>>> double a3d_03(lm, jm, im) ;
>>>>> double a3d_04(lm, jm, im) ;
>>>>> double a3d_05(lm, jm, im) ;
>>>>> double a3d_06(lm, jm, im) ;
>>>>> double a3d_07(lm, jm, im) ;
>>>>> double a3d_08(lm, jm, im) ;
>>>>> }
>>>>>
>>>>> Here is the file structure for the "noncontiguous" case having the problem. As for the contiguous case, I wrote the a2d* in one operation and the a3d* in a second operation.
>>>>>
>>>>>
>>>>> netcdf noncontiguous {
>>>>> dimensions:
>>>>> im = 128 ;
>>>>> jm = 64 ;
>>>>> lm = 64 ;
>>>>> variables:
>>>>> double a2d_01(jm, im) ;
>>>>> double a3d_01(lm, jm, im) ;
>>>>> double a2d_02(jm, im) ;
>>>>> double a3d_02(lm, jm, im) ;
>>>>> double a2d_03(jm, im) ;
>>>>> double a3d_03(lm, jm, im) ;
>>>>> double a2d_04(jm, im) ;
>>>>> double a3d_04(lm, jm, im) ;
>>>>> double a2d_05(jm, im) ;
>>>>> double a3d_05(lm, jm, im) ;
>>>>> double a2d_06(jm, im) ;
>>>>> double a3d_06(lm, jm, im) ;
>>>>> double a2d_07(jm, im) ;
>>>>> double a3d_07(lm, jm, im) ;
>>>>> double a2d_08(jm, im) ;
>>>>> double a3d_08(lm, jm, im) ;
>>>>> }
>>>>>
>>>>>
>>>>> -Max
>>>>>
>>>>>
>>>>> On Tue, 24 Aug 2010, Wei-keng Liao wrote:
>>>>>
>>>>>> Hi, Max,
>>>>>>
>>>>>> Combining multiple noncontiguous requests should work for the nonblocking I/O.
>>>>>> I hope the "noncontiguous" you are referring is the same as what I understand.
>>>>>> In order to ensure that, could you please send us a copy of your test codes?
>>>>>>
>>>>>> Wei-keng
>>>>>>
>>>>>> On Aug 24, 2010, at 1:05 PM, Maxwell Kelley wrote:
>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> (As this is my first post: many thanks to the developers!)
>>>>>>>
>>>>>>> I've been trying the nonblocking I/O interface described at
>>>>>>>
>>>>>>> http://trac.mcs.anl.gov/projects/parallel-netcdf/wiki/CombiningOperations
>>>>>>>
>>>>>>> to combine writes of domain-decomposed arrays via calls to nfmpi_iput_vara_double() followed by an nfmpi_wait_all().
>>>>>>>
>>>>>>> When a combination of write requests corresponds to a contiguous set of arrays in the output file (sequential netcdf IDs), the results are always correct. However, combining write requests for a noncontiguous set of arrays sometimes produces incorrect results. Is this latter access pattern permitted, and has it been extensively tested? I may be doing something wrong which happens to be non-fatal only for contiguous collections of arrays.
>>>>>>>
>>>>>>> I realize that batched-up writes for noncontiguous subsets of a file may not make sense from a performance standpoint.
>>>>>>>
>>>>>>> Thanks for any help,
>>>>>>>
>>>>>>> -Max
>>>>>>>
>>>>>>> -------------------------------------------------------------------
>>>>>>> Maxwell Kelley
>>>>>>> Center for Climate Systems Research, Columbia Univ. Earth Institute
>>>>>>> @
>>>>>>> NASA Goddard Institute for Space Studies
>>>>>>> 2880 Broadway kelley at giss.nasa.gov
>>>>>>> New York NY 10025 1-212-678-5669
>>>>>>> -------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>>
>>>>> <testit.f>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
More information about the parallel-netcdf
mailing list