[mpich2-dev] More ROMIO performance questions

Tue Sep 15 14:58:07 CDT 2009

Rob Latham wrote on 09/15/2009 02:39:30 PM:

> Isn't that what 'bgl_nodes_pset' is supposed to address?  If you are
> i/o rich or i/o poor, aggregate down to 'bgl_nodes_pset' aggregators
> per io node.  There are tons of things ROMIO can do in collective I/O.

Yes, exactly.  When we finally got bgl_nodes_pset and cb_buffer_size 
hinted right, they got reasonable performance.  But not as good as their 
customized testcase.  I'm still looking into this a bit.

> If you pass around a token in the MPI layer, you can easily starve
> processes.  Independent I/O means there's no guarantee when any
> process will be in that call.  so, do you have rank 1 give up the
> token after exiting MPI_FILE_WRITE?  Who does he pass to?  Will they
> be in an MPI call and able to make progress on the receive? Do you
> have rank 4 take the token from someone when he's ready to do
> I/O?
> 

Our thought was doing this within collective i/o.  At some point, instead 
of collecting/moving large contiguous buffers and writing at the 
aggregator -- pass around the token and write at each node in the set. 
Either way, data is written cb_block_size at a time.  It saves passing 
cb_buffer_size around.   This is different than romio_cb_write=automatic 
because I don't want large contiguous buffers to switch back completely to 
independent writes.  Maybe romio_cb_write=coordinated :)

Anyway, I think my question's been answered.  It isn't possible now in 
MPIIO.  Obviously customized apps can do whatever they like.   Meanwhile I 
need to pursue the config and look for the underlying problem or 
limitation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20090915/a6232585/attachment.htm>