[mpich-discuss] patch for ad_lustre_wrcoll.c
Martin Pokorny
mpokorny at nrao.edu
Wed Dec 7 11:38:05 CST 2011
Rob Latham wrote:
> On Wed, Aug 10, 2011 at 01:56:33PM -0600, Martin Pokorny wrote:
>> These changes improve performance by reducing the number of system
>> 'write' calls in the ADIO Lustre collective write code, and
>> perhaps also by keeping the writes ordered. This is especially
>> effective in my application, in which the data are highly
>> interleaved among the processes in the group calling the MPI-IO
>> collective write functions.
>
> Hi Martin. Sorry it has taken me so long to respond to your
> contribution. I think I understand what you're doing and why, and
> I'm going to commit it.
>
> Let me make sure I really do understand by explaining back what you
> are doing:
>
> - in collective I/O, if there are any gaps in the file domain,
> collective i/o does a read-modify-write. This works great on most
> file systems but on Lustre the (implicit, here) locking needed for
> this is extremely costly. So, the Lustre driver has a hint to turn
> off data sieving in collective cases and service each request piece
> by piece.
I'm not convinced that the locking is necessarily extremely costly in
the current implementation. The reason that my application doesn't use
data sieving is more related to the fact that the files being written
have lots of holes, and using data sieving leaves random values in the
files where the holes are. (I realize that this usage isn't ideal, but
the holes will eventually go away, and it's not worth the effort to do
something better at the moment.)
> - because the requests are being serviced piecewise, certain
> workloads could result in out-of-order blocks that once placed back
> in order end up being adjacent to each other and can be merged.
Yes, that's the effect the patch is intended to have. This reduces the
number of system calls to the Lustre client code.
> I would very much like to have a tiny test case that shows this out
> of order workload but for now I am just committing the patch. SVN
> revision 9240 has the fix.
I would like one, too! Unfortunately, my application is a part of a
real-time, event-driven system, and we don't have the ability to
simulate the application inputs. I'll see if I can generate a test case
for you outside of my application, but it may be a while before I can
get to that.
(Sorry about the duplicate message, Rob; I neglected to cc mpich-discuss
in my first message.)
--
Martin Pokorny
Software Engineer - Expanded Very Large Array
National Radio Astronomy Observatory
Socorro, NM USA
More information about the mpich-discuss
mailing list