[Swift-devel] Re: swift-falkon problem

Michael Wilde wilde at mcs.anl.gov
Wed Mar 19 13:46:02 CDT 2008


I was not considering scp and gridftp to introduce artificial delays.
The purpose was two-fold:

1) eliminate need to run swift on a host that mounts the sicortex 
filesystem, as there is no good host that does on which we can run 
long-term.  (we are temporary guests on bblogin). This was the initial 
reason, before we knew of any problems.

2) for dealing with this race, I thought we could avoid any possible NFS 
race conditions by writing directly to the filesystem.  But I now 
realize that this wont necessarily help: the scp and gridftp *servers* 
would not be running on a host that locally mounts the filesystem, and 
the sicortex worker nodes do NFS mounts themselves.

My (likely outdated) understanding of NFS protocol was that its supposed 
to guarantee close-to-open coherence.  Meaning that if two clients want 
to access a file sequentially, and the writing client closes the file 
before the reading client opens the file, then NFS was supposed to 
ensure that the reader correctly saw the existence and content of the file.

If others agree that this should still be the case, then its worth 
looking at our code to make sure that this is the case.  If it wasnt, 
you'd think that more things would break, but perhaps Falkon exacerbates 
any problems in that area due to its low latency.

The race as far as I know is between the worker writing and moving 
result, info, and success status files, and the swift host seeing these, 
correct?

- Mike



On 3/19/08 12:49 PM, Ben Clifford wrote:
> On Wed, 19 Mar 2008, Michael Wilde wrote:
> 
>> I'm cautiously leaning a bit more to the NFS-race theory. I would like to test
>> with scp data transfer.  Am also trying to get gridftp compiled there with
>> help from Raj.  Build is failing with gpt problems, I think I need Ben or
>> Charles on this.
> 
> If an underlying NFS race is the problem, using scp or gridftp won't cure 
> that - it may, by virtue of adding latency, make the problem disppear 
> most/all of the time, but that would be by virtue of slowing down access, 
> not any actual fixing of the problem.
> 
> If you're deliberately introducing artificial delays eg by doing the 
> above, there are probably simpler ways (such as hacking a delay into the 
> wrapper script after doing the touch but before exiting)
> 



More information about the Swift-devel mailing list