[Swift-devel] Re: swift-falkon problem

Michael Wilde wilde at mcs.anl.gov
Fri Mar 21 07:12:03 CDT 2008


My latest test on runs of 25, 100, and 1000 jobs seem to indicate that
with a sync command at the end of the application script, all job status
and data is returned ok every time.

(This is somewhat curious, as the info and success files fur the current
would not yet be complete at the time, but the sync command effects all
other activity on the host, and ensures that at least the currently
existing dirs, files and data are synced, or that their sync has started).

Without the sync, at the moment, virtually all jobs fail, and almost
*no* data is being returned.  Out of 3 runs of 1000 jobs, one run
returned 2 data files, the other two returned no data files. One 100-job
run without sync returned 11 of 100 files.

It seems like the most fruitful testing to see if this sync is totally
fixing the problem is to do lots more runs.

I noted that the bblog host (from which I run Swift) has no special NFS
mount flags, just rw. (I was wondering if they had something on that
would affect coherence; seems not).

I did not have a chance to capture the falkon logs in these tests; I
will look for the ones Ioan mentioned, and try some runs with those logs
captured.

The swift logs I did capture are in the CI log dir, wilde/run{317-328}

run317/comment:amps1 100 sico with sync - ran ok
run318/comment:amps1 100 with no sync - died on first error
run319/comment:amps1 without sync - 11 of 100 returned OK
run320/comment:amps1 100 without sync - no data returned ok
run321/comment:amps1 100 without sync - no data returned ok
run322/comment:amps1 100 with sync - all data returned ok
run323/comment:amps1 100 with sync - all data returned ok
run324/comment:amps1 1000 with sync - all data returned ok
run325/comment:amps1 1000 without sync - no data returned ok
run326/comment:amps1 25 without sync - no data returned ok
run327/comment:amps1 100 without sync - 2 data files returned ok
run328/comment:amps1 1000 with sync - all data returned ok

- Mike

On 3/20/08 6:23 PM, Ben Clifford wrote:
> There is flag for NFS mounts, 'noac', which disables attribute caching on 
> clients, which I think may make the fielsystem behave in the desired 
> fashion; however it sounds like it also massively reduces filesystem 
> performance and fileserver load.
> 
> Mike, you might be able to persuade MCS systems to make such a filesystem 
> available.
> 
> I suspect some multi-second delay after touching the status file and 
> before exiting in the wrapper script is probably the best workaround for 
> now, though.
> 




More information about the Swift-devel mailing list