[mpich-discuss] patch for ad_lustre_wrcoll.c

Tue Nov 29 16:44:30 CST 2011

On Wed, Aug 10, 2011 at 01:56:33PM -0600, Martin Pokorny wrote:
> These changes improve performance by reducing the number of system
> 'write' calls in the ADIO Lustre collective write code, and perhaps
> also by keeping the writes ordered. This is especially effective in
> my application, in which the data are highly interleaved among the
> processes in the group calling the MPI-IO collective write
> functions.

Hi Martin.  Sorry it has taken me so long to respond to your
contribution.  I think I understand what you're doing and why, and I'm
going to commit it.  

Let me make sure I really do understand by explaining back what you
are doing:

- in collective I/O, if there are any gaps in the file domain,
  collective i/o does a read-modify-write.  This works great on most
  file systems but on Lustre the (implicit, here) locking needed for
  this is extremely costly.  So, the Lustre driver has a hint to turn
  off data sieving in collective cases and service each request piece
  by piece.

- because the requests are being serviced piecewise, certain workloads
  could result in out-of-order blocks that once placed back in order
  end up being adjacent to each other and can be merged.

I would very much like to have a tiny test case that shows this out of
order workload but for now I am just committing the patch.  SVN
revision 9240 has the fix.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA