[Swift-devel] Re: swift-falkon problem

Mihael Hategan hategan at mcs.anl.gov
Thu Mar 20 17:07:23 CDT 2008


On Wed, 2008-03-19 at 21:22 +0000, Ben Clifford wrote:
> On Wed, 19 Mar 2008, Michael Wilde wrote:
> 
> > My (likely outdated) understanding of NFS protocol was that its supposed to
> > guarantee close-to-open coherence.  Meaning that if two clients want to access
> > a file sequentially, and the writing client closes the file before the reading
> > client opens the file, then NFS was supposed to ensure that the reader
> > correctly saw the existence and content of the file.
> 
> Right.
> 
> Linux NFS (but this is going back half a decade) had some problem there (I 
> think that caused problems for GRAM2 somewhere, for example) though I do 
> not remember the details; and it was also half a decade ago so has a good 
> chance of being different now.

I seem to remember what looked like an oddity at the time, that the GRAM
PBS script was writing a file on the worker node and insisted that the
script (and the job) be "done" only when the file was visible on the
head node.

> 
> A quick google did not find anything that immediately applied.
> 
> I've also still not entirely ruled out a race somewhere in the 
> falkon->provider-deef->swift stack reporting this.
> 
> > If others agree that this should still be the case, then its worth 
> > looking at our code to make sure that this is the case.  If it wasnt, 
> > you'd think that more things would break, but perhaps Falkon exacerbates 
> > any problems in that area due to its low latency.
> 
> Indeed, the combination of falkon and local filesystem access is probably 
> getting the time between touching the status file on one node and reading 
> it on another down pretty low compared to other submission and file access 
> protocols.
> 
> > The race as far as I know is between the worker writing and moving result,
> > info, and success status files, and the swift host seeing these, correct?
> 
> That's what your logs look like today. But yesterday had different timings 
> that suggested a different problem.
> 
> More runs of the kind that failed would be useful, along with the 
> corresponding falkon logs that Ioan listed in a mail in this thread.
> 




More information about the Swift-devel mailing list