[Swift-devel] Re: swift-falkon problem
Michael Wilde
wilde at mcs.anl.gov
Fri Mar 21 08:34:43 CDT 2008
Runs 329 and 330 (both in the CI log dir) were run with (I hope) the
requested Falkon logs turned on. Note that I turned the deef provider
logs on too, but did not yet verify that it was correctly logging.
run329 was 9 jobs, no sync. All 9 succeed.
run330 was 25 jobs, no sync. 19 of 25 succeeded, the rest failed.
This is starting to confirm a curious pattern: without the sync,
workflows with more jobs achieve *less* total sucessful jobs.
Here's what I recall from the last few days of testing:
1 job wf: all succeeds
9 job wf: al succeed
25 job wf: 15-20 succeed
100 job wf: 1-2 succeed
1000 job wf: 0 succeed
I dont have enough data to confirm this, but the pattern seems to be
present.
I am going to set the problem aside for now, until, Ben and Ioan, you
have a chance to look at the logs from this morning's test.
I'll assume for the moment that the sync "fixes" the problem, and go on
to the application tests I need to run, keeping an eye out for anomalies.
My goal is to to large-scale tests of AMIGA and DOCK under Swift,
reducing wrapper.sh and throttling delays, and doing as much work on
local RAM filesystems as possible.
Mike
On 3/21/08 7:12 AM, Michael Wilde wrote:
> My latest test on runs of 25, 100, and 1000 jobs seem to indicate that
> with a sync command at the end of the application script, all job status
> and data is returned ok every time.
>
> (This is somewhat curious, as the info and success files fur the current
> would not yet be complete at the time, but the sync command effects all
> other activity on the host, and ensures that at least the currently
> existing dirs, files and data are synced, or that their sync has started).
>
> Without the sync, at the moment, virtually all jobs fail, and almost
> *no* data is being returned. Out of 3 runs of 1000 jobs, one run
> returned 2 data files, the other two returned no data files. One 100-job
> run without sync returned 11 of 100 files.
>
> It seems like the most fruitful testing to see if this sync is totally
> fixing the problem is to do lots more runs.
>
> I noted that the bblog host (from which I run Swift) has no special NFS
> mount flags, just rw. (I was wondering if they had something on that
> would affect coherence; seems not).
>
> I did not have a chance to capture the falkon logs in these tests; I
> will look for the ones Ioan mentioned, and try some runs with those logs
> captured.
>
> The swift logs I did capture are in the CI log dir, wilde/run{317-328}
>
> run317/comment:amps1 100 sico with sync - ran ok
> run318/comment:amps1 100 with no sync - died on first error
> run319/comment:amps1 without sync - 11 of 100 returned OK
> run320/comment:amps1 100 without sync - no data returned ok
> run321/comment:amps1 100 without sync - no data returned ok
> run322/comment:amps1 100 with sync - all data returned ok
> run323/comment:amps1 100 with sync - all data returned ok
> run324/comment:amps1 1000 with sync - all data returned ok
> run325/comment:amps1 1000 without sync - no data returned ok
> run326/comment:amps1 25 without sync - no data returned ok
> run327/comment:amps1 100 without sync - 2 data files returned ok
> run328/comment:amps1 1000 with sync - all data returned ok
>
> - Mike
>
> On 3/20/08 6:23 PM, Ben Clifford wrote:
>> There is flag for NFS mounts, 'noac', which disables attribute caching
>> on clients, which I think may make the fielsystem behave in the
>> desired fashion; however it sounds like it also massively reduces
>> filesystem performance and fileserver load.
>>
>> Mike, you might be able to persuade MCS systems to make such a
>> filesystem available.
>>
>> I suspect some multi-second delay after touching the status file and
>> before exiting in the wrapper script is probably the best workaround
>> for now, though.
>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
More information about the Swift-devel
mailing list