collective i/o on zero-dimensional data

Maxwell Kelley kelley at giss.nasa.gov
Tue Oct 5 10:13:22 CDT 2010


Hi Wei-keng,

Thanks for your guidance.

To answer your question, the number of 0D quantities in my checkpoint file 
is small compared to the number of distributed and nondistributed arrays, 
but the fsync is so fatally expensive that I either have to (1) group the 
0D independent-mode writes or (2) do 0D writes in collective mode.  Since 
the 0D writes are made by a number of different components of the code, 
option 1 would be too disruptive to the program design.

Do you think that declaring 0D variables as 1D arrays of length 1 and 
setting the count vector will be more efficient than leaving them as 0D? 
Is there a "competition" among processes in the 0D collective-write case?

-Max

On Mon, 4 Oct 2010, Wei-keng Liao wrote:

> Hi, Max,
>
> Switching between collective and independent data modes is expensive, 
> because fsync will be called each time the mode switches. So, grouping 
> writes together to reduce the number of switches is a good strategy.
>
> As for writing 0D variables, pnetcdf will ignore both arguments start[] 
> and count[] and always let the calling process write one value (1 
> element) to the variable.
>
> So, if the call is collective and the writing processes have different 
> values to write, then the outcome in the file will be undefined (usually 
> the last process wins, but no way to know who is the last one). So, one 
> solution is to define the variable to be a 1-D array of length 1 and set 
> argument count[0] to zero for all the processes except the one you would 
> like its data get written to the file.
>
> As for recommending collective or independent I/O for 0D variables, it 
> depends on your I/O pattern. Do you have a lot of 0D variables? Are they 
> being overwritten frequently and by different processes? Please note 
> that a good I/O performance usually happens when the request is large 
> and contiguous.
>
> Use independent mode for all data can hurt the performance for the 
> "distributed" arrays, as independent APIs may produce many small, 
> noncontiguous requests to the file system.
>
> Wei-keng
>
> On Oct 4, 2010, at 6:42 PM, Maxwell Kelley wrote:
>
>>
>> Hello,
>>
>> Some code I ported from a GPFS to a Lustre machine was hit by the performance effects of switching back and forth between collective mode for distributed data and independent mode for non-distributed data. Converting the writes of non-distributed data like zero-dimensional (0D) variables to collective mode was straightforward, but with a small wrinkle. Since the start/count vectors passed to put_vara_double_all cannot be used to indicate which process possesses the definitive value of a 0D variable, I could only get correct results by ensuring that this datum is identical on all processes. Can I count on put_vara_double_all always behaving this way, or could future library versions refuse to write 0D data in collective mode? BTW the return code did not indicate an error when process-varying 0D data was passed to put_vara_double_all.
>>
>> Grouping independent-mode writes could reduce the number of switches between collective and independent mode but would require significant code reorganization so I tried the all-collective option first. I could also declare 0D variables as 1D arrays of length 1.
>>
>> Before going any further, I should also ask about the recommended method for writing a 0D variable.  Collective I/O?  Or independent I/O with system-specific MPI hints (I haven't explored the MPI hints)?  Or should I use independent mode for all data, including the distributed arrays?
>>
>> -Max
>>
>>
>
>
>



More information about the parallel-netcdf mailing list