[Swift-devel] Re: swift-falkon problem
Mihael Hategan
hategan at mcs.anl.gov
Thu Mar 20 17:07:23 CDT 2008
On Wed, 2008-03-19 at 21:22 +0000, Ben Clifford wrote:
> On Wed, 19 Mar 2008, Michael Wilde wrote:
>
> > My (likely outdated) understanding of NFS protocol was that its supposed to
> > guarantee close-to-open coherence. Meaning that if two clients want to access
> > a file sequentially, and the writing client closes the file before the reading
> > client opens the file, then NFS was supposed to ensure that the reader
> > correctly saw the existence and content of the file.
>
> Right.
>
> Linux NFS (but this is going back half a decade) had some problem there (I
> think that caused problems for GRAM2 somewhere, for example) though I do
> not remember the details; and it was also half a decade ago so has a good
> chance of being different now.
I seem to remember what looked like an oddity at the time, that the GRAM
PBS script was writing a file on the worker node and insisted that the
script (and the job) be "done" only when the file was visible on the
head node.
>
> A quick google did not find anything that immediately applied.
>
> I've also still not entirely ruled out a race somewhere in the
> falkon->provider-deef->swift stack reporting this.
>
> > If others agree that this should still be the case, then its worth
> > looking at our code to make sure that this is the case. If it wasnt,
> > you'd think that more things would break, but perhaps Falkon exacerbates
> > any problems in that area due to its low latency.
>
> Indeed, the combination of falkon and local filesystem access is probably
> getting the time between touching the status file on one node and reading
> it on another down pretty low compared to other submission and file access
> protocols.
>
> > The race as far as I know is between the worker writing and moving result,
> > info, and success status files, and the swift host seeing these, correct?
>
> That's what your logs look like today. But yesterday had different timings
> that suggested a different problem.
>
> More runs of the kind that failed would be useful, along with the
> corresponding falkon logs that Ioan listed in a mail in this thread.
>
More information about the Swift-devel
mailing list