[mpich2-dev] More ROMIO performance questions

Tue Sep 15 15:17:49 CDT 2009

On Sep 15, 2009, at 2:58 PM, Bob Cernohous wrote:

>
> Rob Latham wrote on 09/15/2009 02:39:30 PM:
>
> > Isn't that what 'bgl_nodes_pset' is supposed to address?  If you are
> > i/o rich or i/o poor, aggregate down to 'bgl_nodes_pset' aggregators
> > per io node.  There are tons of things ROMIO can do in collective  
> I/O.
>
> Yes, exactly.  When we finally got bgl_nodes_pset and cb_buffer_size  
> hinted right, they got reasonable performance.  But not as good as  
> their customized testcase.  I'm still looking into this a bit.
>
> > If you pass around a token in the MPI layer, you can easily starve
> > processes.  Independent I/O means there's no guarantee when any
> > process will be in that call.  so, do you have rank 1 give up the
> > token after exiting MPI_FILE_WRITE?  Who does he pass to?  Will they
> > be in an MPI call and able to make progress on the receive? Do you
> > have rank 4 take the token from someone when he's ready to do
> > I/O?
>
> Our thought was doing this within collective i/o.  At some point,  
> instead of collecting/moving large contiguous buffers and writing at  
> the aggregator -- pass around the token and write at each node in  
> the set.  Either way, data is written cb_block_size at a time.  It  
> saves passing cb_buffer_size around.   This is different than  
> romio_cb_write=automatic because I don't want large contiguous  
> buffers to switch back completely to independent writes.  Maybe  
> romio_cb_write=coordinated :)

This (coordinated) seems like a nice way to get the advantages of  
aggregation without communication overheads...

Rob