Independent write

Rob Ross rross at mcs.anl.gov
Fri Mar 12 11:42:21 CST 2004


On Fri, 12 Mar 2004, Joachim Worringen wrote:

> Rob Ross:
> > > One might define an "atomic mode" for pNetCDF, like in MPI-IO.
> >
> > Ack no!!!!  I *hate* the MPI-IO atomic mode!!!  We'd have to augment the
> > interface to have an append mode as well, because of the race condition
> > mentioned.
> 
> I didn't say that I'm an "atomic fan" ;-). But there are reasonably efficient 
> ways to implemented it under certain conditions.

Glad to hear you aren't an "atomic fan"; that's a bad crowd :).  I agree
that there are decent ways to implement it in some cases; I spend too much
time working with clusters with limited networks I think -- I tend to
think of RMA as being only useful in a BSP-like mode as a result.  And I'm 
pretty opinionated about how PFSs should be put together too.

> Will PVFS2 provide atomic mode? ;-)

Not right now; PVFS2 servers don't communicate, making atomic mode
intractable for the moment.  There is a student working on some versioning
infrastructure that would enable it though...

> > > MPI-2 one-sided communication could be a way to implement the required
> > > synchronization (doulbe-locking two windows to access some data in one
> > > of the windows atomically like fetch&increment - not nice, but works).
> > > It's efficiency would depend on the progress characteristics of the
> > > underlying MPI implementation and/or the interconnect and/or the
> > > communication behaviour of the application, but it would work anyway.
> >
> > Yeah, this is a clever way to implement this sort of functionality.  
> > There is at least one group looking at this sort of thing for consistent
> > caching in the MPI-IO layer.
> 
> Yes, you can do a lot of things with this (see above).
> 
> [...]
> > Maybe I should add a section to the document on "implementing append
> > operations in PnetCDF", describing a couple of approaches and giving some
> > code?  I would much rather spend a little time doing this and helping
> > people do things the right way.
> 
> I share your concerns (people tend to do it the easy way!), and think this is 
> the better way to handle this issue, too. It's necessary to change thinking 
> if going parallel.

This is definitely a big issue in the parallel I/O domain; even getting 
groups to look at parallel I/O has been a challenge, much less doing a 
little extra work to make it fast.  On the other hand, it's probably 
easier than merging N files together after the run :).

> > > As many interconnect today allow RMA, the performance of passive
> > > synchronization should become better with the MPI implementations
> > > exploiting such capabilities.
> >
> > Agreed.  There are definitely lots of opportunities there!  RMA
> > implementations need to be more wide-spread though...
> 
> Doesn't everything next to Ethernet support some sort of RMA today? And even 
> for Ethernet, there are some approaches to do such staff (forgot the name of 
> this). And we know that "Ethernet always wins"... The time might come which 
> will make it advisable to update or extent the MPI one-sided specs.

Can you do these interesting things in an efficient way with simple RMA 
capabilities, or do you need some basic accumulate functionality?  We 
should discuss this over beers with Rajeev and Bill and Jesper at 
EuroPVM/MPI :).

I agree, once people have some more experience with the MPI RMA, there 
will probably be some tuning...

Good discussion!  I should know more about this area.

Rob




More information about the parallel-netcdf mailing list