<br><tt><font size=2>Rob Latham wrote on 09/15/2009 02:39:30 PM:<br>

 <br>

&gt; Isn't that what 'bgl_nodes_pset' is supposed to address? &nbsp;If

you are<br>

&gt; i/o rich or i/o poor, aggregate down to 'bgl_nodes_pset' aggregators<br>

&gt; per io node. &nbsp;There are tons of things ROMIO can do in collective

I/O.<br>

</font></tt>

<br><tt><font size=2>Yes, exactly. &nbsp;When we finally got bgl_nodes_pset

and cb_buffer_size hinted right, they got reasonable performance. &nbsp;But

not as good as their customized testcase. &nbsp;I'm still looking into

this a bit.</font></tt>

<br>

<br><tt><font size=2>&gt; If you pass around a token in the MPI layer,

you can easily starve<br>

&gt; processes. &nbsp;Independent I/O means there's no guarantee when any<br>

&gt; process will be in that call. &nbsp;so, do you have rank 1 give up

the<br>

&gt; token after exiting MPI_FILE_WRITE? &nbsp;Who does he pass to? &nbsp;Will

they<br>

&gt; be in an MPI call and able to make progress on the receive? Do you<br>

&gt; have rank 4 take the token from someone when he's ready to do<br>

&gt; I/O?<br>

&gt; <br>

</font></tt>

<br><tt><font size=2>Our thought was doing this within collective i/o.

&nbsp;At some point, instead of collecting/moving large contiguous buffers

and writing at the aggregator -- pass around the token and write at each

node in the set. &nbsp;Either way, data is written cb_block_size at a time.

&nbsp;It saves passing cb_buffer_size around. &nbsp; This is different

than romio_cb_write=automatic because I don't want large contiguous buffers

to switch back completely to independent writes. &nbsp;Maybe romio_cb_write=coordinated

:)</font></tt>

<br>

<br><tt><font size=2>Anyway, I think my question's been answered. &nbsp;It

isn't possible now in MPIIO. &nbsp;Obviously customized apps can do whatever

they like. &nbsp; Meanwhile I need to pursue the config and look for the

underlying problem or limitation.</font></tt>

<br>