<font size=2 face="sans-serif">In light of this discussion could someone
perhaps update the wiki page on striping_unit </font><a href="http://trac.mcs.anl.gov/projects/parallel-netcdf/wiki/StripingUnitHint"><font size=2 face="sans-serif">http://trac.mcs.anl.gov/projects/parallel-netcdf/wiki/StripingUnitHint</font></a>
<br>
<br><font size=2 face="sans-serif">It is the section titled "Example
scenario" that I find confusing. They set the striping_unit to 1/32
of the block size (128kB of 4MiB) to reserve space for an "<i>enormous
header while still making it possible to avoid a few unaligned file system
accesses</i>" </font>
<br>
<br><font size=2 face="sans-serif">Cheers,</font>
<br>
<br><font size=2 face="sans-serif">/Nils</font>
<br><font size=2 face="sans-serif">______________________________________________<br>
Nils Smeds, IBM Deep Computing / World Wide Coordinated Tuning Team<br>
IT Specialist, Mobile phone: +46-70-793 2639<br>
Fax. +46-8-793 9523<br>
Mail address: IBM Sweden; Loc. 5-03; 164 92 Stockholm; SWEDEN</font>
<br>
<br>
<br>
<br><font size=1 color=#5f5f5f face="sans-serif">From:
</font><font size=1 face="sans-serif">Wei-keng Liao <wkliao@ece.northwestern.edu></font>
<br><font size=1 color=#5f5f5f face="sans-serif">To:
</font><font size=1 face="sans-serif">parallel-netcdf@lists.mcs.anl.gov</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Date:
</font><font size=1 face="sans-serif">03/10/2011 01:26 AM</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Subject:
</font><font size=1 face="sans-serif">Re: pnetCDF
performance issues</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Sent by:
</font><font size=1 face="sans-serif">parallel-netcdf-bounces@lists.mcs.anl.gov</font>
<br>
<hr noshade>
<br>
<br>
<br><tt><font size=2>>> 1) What should the defaults be so that users
get good performance "out of the box"?<br>
<br>
In term of performance, setting the header alignment size to the file striping
size<br>
gives the best performance. But, we also need to consider the file size.
Say, if a user<br>
create a file with a few small arrays each of size a few KBs and the file
system<br>
striping size is 4 MB, do we want to enforce this default alignment? (as
it produces<br>
a lot of used space.)<br>
<br>
Proposed solution below.<br>
<br>
<br>
>> 2) Can/should pnetCDF diagnose poor choices and inform the user?<br>
<br>
The only diagnosis I can think of is to see if a user's choice matches<br>
the file system striping size. (Match means the hints "nc_header_align_size"<br>
chosen by the user being a multiple of striping size.)<br>
<br>
As for informing users a poor choice, Rob's suggestion is fine. I personally<br>
think a feedback from the I/O library (or other stack) is very useful.<br>
<br>
<br>
>> 3) Can/should MPI-IO "fix" this by exploiting the MPI-IO
semantics<br>
>> to permit converting writes to be aligned (e.g., by caching)?<br>
<br>
In MPI collective I/Os, writes from the aggregators are aligned with the<br>
striping size, if the striping size can be obtained from the system. Currently,
the<br>
ROMIO drivers for PVFS and Lustre are collecting the striping info into<br>
the hints. PnetCDF can use those info to choose a right header alignment
size.<br>
<br>
As for independent I/Os, no alignment is done.<br>
<br>
If the striping info cannot be obtained, pnetcdf currently is using 512
bytes<br>
for the file header alignment size.<br>
<br>
>> <br>
>> Of these, (1) is the most important for pnetCDF, particularly
as<br>
>> users compare approaches.<br>
> <br>
<br>
<br>
I propose the following way to pick a default value.<br>
<br>
if ROMIO can obtain the file striping size<br>
then<br>
if the total aggregate array size is at least N times
of striping size, (say N=4)<br>
then pnetcdf uses the file striping size as the header
alignment size<br>
else 512 bytes is used<br>
else<br>
use 512 bytes<br>
<br>
(Note that the header size is calculated at the call to ncmpi_enddef. In
the meantime,<br>
the number of arrays and their sizes are also known.)<br>
<br>
Wei-keng<br>
<br>
> <br>
<br>
<br>
<br>
> One:<br>
> <br>
> pnetcdf could stat the file system, but take a peek at ROMIO's file<br>
> system detection code for the state of portable statfs. today,<br>
> perhaps it is less of a problem than when that code was written a<br>
> decade ago. What I mean to say is: "does there exist a
portable way<br>
> to determine alignment"? st_blksize is probably our best
bet, but on<br>
> Lustre it's actually more important not to align blocks but to hit
the<br>
> same OST.<br>
> <br>
> Just because it's hard doesn't mean we shouldn't do it, of course...<br>
> <br>
> HDF5 has this problem too: both libraries would benefit from an MPI-IO<br>
> interface to "file system features": alignment and "optimum
tranfer<br>
> size" come to mind. others no doubt.<br>
> <br>
> two:<br>
> <br>
> pnetcdf has two ways to get information back to the caller: the return<br>
> code and the info object. A read-only "pnetcdf_how_we_doin"
hint<br>
> might do the trick.<br>
> <br>
> three:<br>
> <br>
> some MPI-IO implementations do fix this, as long as collective I/O
is<br>
> used. The MPI-IO on BlueGene, for example, always forces collective<br>
> I/O (even if operations are not overlapping), then aligns file domains<br>
> to block size boundaries. I know, I just complained about how<br>
> un-portable st_blksize can be but 'ad_bgl' gets to make some<br>
> simplifying assumptions.<br>
> <br>
> ROMIO, at least recent versions, can also do some file domain magic<br>
> - "romio_min_fdomain_size" will enforce a lower bound on
the amount of<br>
> I/O an aggregator will do.<br>
> <br>
> - set the "striping_unit" hint and ROMIO will ensure file
domain<br>
> boundaries are aligned to a multiple of that value.<br>
> <br>
> ==rob<br>
> <br>
> -- <br>
> Rob Latham<br>
> Mathematics and Computer Science Division<br>
> Argonne National Lab, IL USA<br>
> <br>
<br>
</font></tt>
<br><font size=2 face="sans-serif"><br>
<br>
Såvida annat inte anges ovan: / Unless stated otherwise above:<br>
IBM Svenska AB<br>
Organisationsnummer: 556026-6883<br>
Adress: 164 92 Stockholm<br>
</font>