[mpich2-dev] More ROMIO performance questions
Rob Ross
rross at mcs.anl.gov
Tue Sep 15 15:17:49 CDT 2009
On Sep 15, 2009, at 2:58 PM, Bob Cernohous wrote:
>
> Rob Latham wrote on 09/15/2009 02:39:30 PM:
>
> > Isn't that what 'bgl_nodes_pset' is supposed to address? If you are
> > i/o rich or i/o poor, aggregate down to 'bgl_nodes_pset' aggregators
> > per io node. There are tons of things ROMIO can do in collective
> I/O.
>
> Yes, exactly. When we finally got bgl_nodes_pset and cb_buffer_size
> hinted right, they got reasonable performance. But not as good as
> their customized testcase. I'm still looking into this a bit.
>
> > If you pass around a token in the MPI layer, you can easily starve
> > processes. Independent I/O means there's no guarantee when any
> > process will be in that call. so, do you have rank 1 give up the
> > token after exiting MPI_FILE_WRITE? Who does he pass to? Will they
> > be in an MPI call and able to make progress on the receive? Do you
> > have rank 4 take the token from someone when he's ready to do
> > I/O?
>
> Our thought was doing this within collective i/o. At some point,
> instead of collecting/moving large contiguous buffers and writing at
> the aggregator -- pass around the token and write at each node in
> the set. Either way, data is written cb_block_size at a time. It
> saves passing cb_buffer_size around. This is different than
> romio_cb_write=automatic because I don't want large contiguous
> buffers to switch back completely to independent writes. Maybe
> romio_cb_write=coordinated :)
This (coordinated) seems like a nice way to get the advantages of
aggregation without communication overheads...
Rob
More information about the mpich2-dev
mailing list