<br><font size=2 face="sans-serif">Yes, I meant cb_buffer_size.</font>
<br>
<br><font size=2 face="sans-serif">I believe it was tested lockless, but
I'll try to verify that.</font>
<br>
<br><font size=2 face="sans-serif">I'm not completely assuming a high performance
file system (HPFS). What about NFS? Or even with a HPFS, BG
has "i/o poor" racks with 512 cores (writers) to 1 i/o node (HPFS
client). Would an equivalent non-BG example be a server (HPFS client)
with 512 processes (writers) have problems? </font>
<br>
<br><font size=2 face="sans-serif">Do we just accept less performance in
these scenarios? Is it all up to the (single) HPFS client to figure it
out? Is it all HPFS config and they got it wrong? They only
saw the problem with, for example, all 512 cores writing to the same file.
All 512 cores writing to different files worked well... so I guess
it could be client config/limited resources.</font>
<br>
<br><font size=2 face="sans-serif">Or does an option to throttle back individual
writers going through a single point/client/server/? make any sense at
all? Collective i/o through N aggregators per i/o node works
pretty well if you get the hints right, but with more overhead than their
customized flow control of N writers per i/o node.</font>
<br>
<br><font size=2 face="sans-serif">I thought about shared files. If
they opened N shared files per pset they would basically be getting N concurrent
writers/readers. This would just be another way to coordinate like
their customized testcase.</font>
<br><font size=2 face="sans-serif"><br>
Bob Cernohous: (T/L 553) 507-253-6093<br>
<br>
BobC@us.ibm.com<br>
IBM Rochester, Building 030-2(C335), Department 61L<br>
3605 Hwy 52 North, Rochester, MN 55901-7829<br>
<br>
> Chaos reigns within.<br>
> Reflect, repent, and reboot.<br>
> Order shall return.<br>
</font>
<br>
<br><tt><font size=2>mpich2-dev-bounces@mcs.anl.gov wrote on 09/14/2009
05:48:00 PM:<br>
<br>
> Rob Latham <robl@mcs.anl.gov> </font></tt>
<br><tt><font size=2>> Sent by: mpich2-dev-bounces@mcs.anl.gov<br>
> </font></tt>
<br><tt><font size=2>> 09/14/2009 05:48 PM</font></tt>
<br><tt><font size=2>> <br>
> Please respond to<br>
> mpich2-dev@mcs.anl.gov</font></tt>
<br><tt><font size=2>> <br>
> To</font></tt>
<br><tt><font size=2>> <br>
> mpich2-dev@mcs.anl.gov</font></tt>
<br><tt><font size=2>> <br>
> cc</font></tt>
<br><tt><font size=2>> <br>
> Subject</font></tt>
<br><tt><font size=2>> <br>
> Re: [mpich2-dev] More ROMIO performance questions</font></tt>
<br><tt><font size=2>> <br>
> On Mon, Sep 14, 2009 at 04:57:47PM -0500, Bob Cernohous wrote:<br>
> > <br>
> > We tried to tune it with hints for cb_block_size and get ok performance
<br>
> > when we can avoid read/write data sieving.<br>
> <br>
> I'm sure you must have meant "cb_buffer_size" ? <br>
> <br>
> > They customized the testcase to coordinate/flow-control the non-collective
<br>
> > i/o and they get great performance. They only have N simultaneous
<br>
> > writers/readers active. They pass a token around and take
turns. It's <br>
> > almost like having N aggregators but without the collective i/o
overhead <br>
> > to pass the data around. Instead they pass a small token
and take turns <br>
> > writing the large, non-interleaved contiguous data blocks.<br>
> > <br>
> > I'm not aware of anything in MPIIO or ROMIO that would do tihs?
Has this <br>
> > been explored by the experts (meaning you guys)? <br>
> <br>
> In the ordered mode routines, we pass a token around to ensure that<br>
> process write/read in rank-order. (this is actually a pretty
naive<br>
> way to implement ordered mode, but until very recently nobody seemed<br>
> too concerned about shared file pointer performance).<br>
> <br>
> We don't do anything like this in ROMIO because frankly if a high<br>
> performance file system can't handle simultaneous non-interleaved<br>
> contiguous data blocks (what we would consider the best case scenario<br>
> performance-wise), then a lot of ROMIO assumptions about how to<br>
> achieve peak performance kind of go out the window.<br>
> <br>
> However, Kevin's suggestion that this is instead due to lock<br>
> contention makes a lot of sense and I'm curious to hear what if any<br>
> impact that has on your customer's performance.<br>
> <br>
> ==rob<br>
> <br>
> -- <br>
> Rob Latham<br>
> Mathematics and Computer Science Division<br>
> Argonne National Lab, IL USA<br>
</font></tt>