[Swift-devel] Re: swift-falkon problem

Ben Clifford benc at hawaga.org.uk
Thu Mar 20 18:29:13 CDT 2008


If there is no status file and we rely on falkon reporting success; 
then we go to retrieve the last data file that was written out by the job, 
and 'oh! filesystem race condition, it isn't there...' for the same 
reasons that the status file isn't there now.

On Thu, 20 Mar 2008, Ioan Raicu wrote:

> But the status file is written last, all from the same node, so in theory
> (would have to be tested, or at least verified by someone who knows NFS better
> than I do), if the status file appears, then the other files would also be
> there.  A year ago, there was no status file... this was added later.  What
> was the main motivator for adding the status file?  Was is that you couldn't
> rely on the provider's exit codes?  Or something else?
> 
> Ioan
> 
> Ben Clifford wrote:
> > On Thu, 20 Mar 2008, Ioan Raicu wrote:
> > 
> >   
> > > Why could Swift not have a retry mechanism, given that it received a
> > > successful exit code, be more persistent in looking for the success or
> > > failure
> > > file, and if it doesn't exist, to try it again after some small amount of
> > > sleep...  this would certainly hide (and potentially solve) the race
> > > condition, with a persisitent enough retry mechanism, wouldn't it?
> > >     
> > 
> > The goal is not just to find a status file; there is other stuff beign
> > written to the shared filesystem and its not clear that the status files
> > appearing would guarantee that the other files had appeared too.
> > 
> >   
> 
> 



More information about the Swift-devel mailing list