[Swift-devel] Re: swift-falkon problem
Ben Clifford
benc at hawaga.org.uk
Tue Mar 18 16:45:56 CDT 2008
On Tue, 18 Mar 2008, Ioan Raicu wrote:
> Could a latency of NFS in which one node creates a
> file/dir and another node requires xxx time (in this case, 5 sec) before it
> actually sees the file, explain what Mike is seeing? If this is a likely
> explanation, then the race condition is that the exit code goes from worker to
> Falkon service to Swift faster than NFS can update its file/dir list, and when
> Swift checks for the file or dir (probably within 10s of milliseconds) of the
> job completion, it can't find the file/dir. Are there any counterarguments
> that would make this hypothesis not possible? Just another hypothesis which
> might be worth investigating.
>
According to the timing in the log file, Swift is getting a notification
from provider-deef that the job completed before the actual job has even
been run to completion on the worker, well before the wrapper even
attempts to write out a status file.
I'm not accusing this of being a problem inside Falkon - I'm saying I
think its happening somewhere below the Swift layer, so it could well be
provider-deef, which is probably the most neglected part of this whole
stack.
Mike, are you running with those extra debug lines in the log4j
configuration? If not, please run again with them turned on. Also Ioan can
probably recommend which Falkon logs to keep so we can see what's
happening for a job there and approach the problem from the other end of
the stack too.
--
More information about the Swift-devel
mailing list