[mpich2-dev] More ROMIO performance questions
Bob Cernohous
bobc at us.ibm.com
Mon Sep 14 17:04:28 CDT 2009
I'm not sure my explanation was very good. Here's a note I just received
on the topic.
----------------
What we have shown is that to obtain high performance I/O there is a need
for a scheduling of the writers. This is perhaps of more importance on
Blue Gene than many other platforms, but I'd expect other set-ups where it
may be of importance to limit the number of concurrent data streams
to/from the IO system. The MPIIO collective mode is the "right" interface
to this, since this is the API call that has the necessary amount of
information.
For BG (but really for any GPFS based cluster) we'd like to point out that
the collective write also can provide the scheduling explicitly coded in
our example code. The bgl_nodes_pset hint should be possible to use in
this context too to set the number of simultaneous writers per pset (when
romio_cb_{read,write} is set to automatic). By limiting the number of
concurrent IO streams this way, I'd guess that you may be able to more
efficiently utilize the cache hierarchy (GPFS client buffers, NSD server
buffers and backend storage caches).
But this is more of a suggestion for future improvements and may be better
addressed to the ROMIO community.
----------------
I'm wondering if the ROMIO community has already considered this in some
way.
Bob Cernohous: (T/L 553) 507-253-6093
BobC at us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester, MN 55901-7829
> Chaos reigns within.
> Reflect, repent, and reboot.
> Order shall return.
Bob Cernohous/Rochester/IBM at IBMUS
Sent by: mpich2-dev-bounces at mcs.anl.gov
09/14/2009 04:57 PM
Please respond to
mpich2-dev at mcs.anl.gov
To
mpich2-dev at mcs.anl.gov
cc
Subject
[mpich2-dev] More ROMIO performance questions
We have another i/o scenario with interesting performance issues.
One again, it's large non-interleaved contiguous blocks being written/read
(checkpointing software). We ran into the same problems with data sieving
and romio_cb_write/read = enable as we discussed a couple weeks ago.
We tried to tune it with hints for cb_block_size and get ok performance
when we can avoid read/write data sieving.
Trying romio_cb_write/read = automatic gets very poor
performance.Similarly, pure non-collective writes get very poor
performance. It seems like having too many writers/readers performs
poorly on their configuration ... so
They customized the testcase to coordinate/flow-control the non-collective
i/o and they get great performance. They only have N simultaneous
writers/readers active. They pass a token around and take turns. It's
almost like having N aggregators but without the collective i/o overhead
to pass the data around. Instead they pass a small token and take turns
writing the large, non-interleaved contiguous data blocks.
I'm not aware of anything in MPIIO or ROMIO that would do tihs? Has this
been explored by the experts (meaning you guys)?
Bob Cernohous: (T/L 553) 507-253-6093
BobC at us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester, MN 55901-7829
> Chaos reigns within.
> Reflect, repent, and reboot.
> Order shall return.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20090914/27314e69/attachment.htm>
More information about the mpich2-dev
mailing list