<br><font size=2 face="sans-serif">We have another i/o scenario with interesting

performance issues.</font>

<br>

<br><font size=2 face="sans-serif">One again, it's large non-interleaved

contiguous blocks being written/read (checkpointing software). &nbsp;We

ran into the same problems with data sieving and romio_cb_write/read =

enable as we discussed a couple weeks ago.</font>

<br>

<br><font size=2 face="sans-serif">We tried to tune it with hints for cb_block_size

and get ok performance when we can avoid read/write data sieving.</font>

<br>

<br><font size=2 face="sans-serif">Trying romio_cb_write/read = automatic

gets very poor performance.Similarly, pure non-collective writes get very

poor performance. &nbsp;It seems like having too many writers/readers performs

poorly on their configuration ... so</font>

<br>

<br><font size=2 face="sans-serif">They customized the testcase to coordinate/flow-control

the non-collective i/o and they get great performance. &nbsp; They only

have N simultaneous writers/readers active. &nbsp;They pass a token around

and take turns. &nbsp;It's almost like having N aggregators but without

the collective i/o overhead to pass the data around. &nbsp;Instead they

pass a small token and take turns writing the large, non-interleaved contiguous

data blocks.</font>

<br>

<br><font size=2 face="sans-serif">I'm not aware of anything in MPIIO or

ROMIO that would do tihs? &nbsp; Has this been explored by the experts

(meaning you guys)? &nbsp; &nbsp;</font>

<br>

<br>

<br><font size=2 face="sans-serif"><br>

Bob Cernohous: &nbsp;(T/L 553) 507-253-6093<br>

<br>

BobC@us.ibm.com<br>

IBM Rochester, Building 030-2(C335), Department 61L<br>

3605 Hwy 52 North, Rochester, &nbsp;MN 55901-7829<br>

<br>

&gt; Chaos reigns within.<br>

&gt; Reflect, repent, and reboot.<br>

&gt; Order shall return.<br>

</font>