<br><font size=2 face="sans-serif">Yes, I meant cb_buffer_size.</font>

<br>

<br><font size=2 face="sans-serif">I believe it was tested lockless, but

I'll try to verify that.</font>

<br>

<br><font size=2 face="sans-serif">I'm not completely assuming a high performance

file system (HPFS). &nbsp;What about NFS? &nbsp;Or even with a HPFS, BG

has &quot;i/o poor&quot; racks with 512 cores (writers) to 1 i/o node (HPFS

client). &nbsp; Would an equivalent non-BG example be a server (HPFS client)

with 512 processes (writers) have problems? &nbsp;</font>

<br>

<br><font size=2 face="sans-serif">Do we just accept less performance in

these scenarios? Is it all up to the (single) HPFS client to figure it

out? &nbsp;Is it all HPFS config and they got it wrong? &nbsp;They only

saw the problem with, for example, all 512 cores writing to the same file.

&nbsp;All 512 cores writing to different files worked well... so I guess

it could be client config/limited resources.</font>

<br>

<br><font size=2 face="sans-serif">Or does an option to throttle back individual

writers going through a single point/client/server/? make any sense at

all? &nbsp; &nbsp;Collective i/o through N aggregators per i/o node works

pretty well if you get the hints right, but with more overhead than their

customized flow control of N writers per i/o node.</font>

<br>

<br><font size=2 face="sans-serif">I thought about shared files. &nbsp;If

they opened N shared files per pset they would basically be getting N concurrent

writers/readers. &nbsp;This would just be another way to coordinate like

their customized testcase.</font>

<br><font size=2 face="sans-serif"><br>

Bob Cernohous: &nbsp;(T/L 553) 507-253-6093<br>

<br>

BobC@us.ibm.com<br>

IBM Rochester, Building 030-2(C335), Department 61L<br>

3605 Hwy 52 North, Rochester, &nbsp;MN 55901-7829<br>

<br>

&gt; Chaos reigns within.<br>

&gt; Reflect, repent, and reboot.<br>

&gt; Order shall return.<br>

</font>

<br>

<br><tt><font size=2>mpich2-dev-bounces@mcs.anl.gov wrote on 09/14/2009

05:48:00 PM:<br>

<br>

&gt; Rob Latham &lt;robl@mcs.anl.gov&gt; </font></tt>

<br><tt><font size=2>&gt; Sent by: mpich2-dev-bounces@mcs.anl.gov<br>

&gt; </font></tt>

<br><tt><font size=2>&gt; 09/14/2009 05:48 PM</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; Please respond to<br>

&gt; mpich2-dev@mcs.anl.gov</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; To</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; mpich2-dev@mcs.anl.gov</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; cc</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; Subject</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; Re: [mpich2-dev] More ROMIO performance questions</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; On Mon, Sep 14, 2009 at 04:57:47PM -0500, Bob Cernohous wrote:<br>

&gt; &gt; <br>

&gt; &gt; We tried to tune it with hints for cb_block_size and get ok performance

<br>

&gt; &gt; when we can avoid read/write data sieving.<br>

&gt; <br>

&gt; I'm sure you must have meant &quot;cb_buffer_size&quot; ? &nbsp;<br>

&gt; <br>

&gt; &gt; They customized the testcase to coordinate/flow-control the non-collective

<br>

&gt; &gt; i/o and they get great performance. &nbsp; They only have N simultaneous

<br>

&gt; &gt; writers/readers active. &nbsp;They pass a token around and take

turns. &nbsp;It's <br>

&gt; &gt; almost like having N aggregators but without the collective i/o

overhead <br>

&gt; &gt; to pass the data around. &nbsp;Instead they pass a small token

and take turns <br>

&gt; &gt; writing the large, non-interleaved contiguous data blocks.<br>

&gt; &gt; <br>

&gt; &gt; I'm not aware of anything in MPIIO or ROMIO that would do tihs?

&nbsp; Has this <br>

&gt; &gt; been explored by the experts (meaning you guys)? <br>

&gt; <br>

&gt; In the ordered mode routines, we pass a token around to ensure that<br>

&gt; process write/read in rank-order. &nbsp;(this is actually a pretty

naive<br>

&gt; way to implement ordered mode, but until very recently nobody seemed<br>

&gt; too concerned about shared file pointer performance).<br>

&gt; <br>

&gt; We don't do anything like this in ROMIO because frankly if a high<br>

&gt; performance file system can't handle simultaneous non-interleaved<br>

&gt; contiguous data blocks (what we would consider the best case scenario<br>

&gt; performance-wise), then a lot of ROMIO assumptions about how to<br>

&gt; achieve peak performance kind of go out the window.<br>

&gt; <br>

&gt; However, Kevin's suggestion that this is instead due to lock<br>

&gt; contention makes a lot of sense and I'm curious to hear what if any<br>

&gt; impact that has on your customer's performance.<br>

&gt; <br>

&gt; ==rob<br>

&gt; <br>

&gt; -- <br>

&gt; Rob Latham<br>

&gt; Mathematics and Computer Science Division<br>

&gt; Argonne National Lab, IL USA<br>

</font></tt>