[Swift-devel] Re: swift-falkon problem

Michael Wilde wilde at mcs.anl.gov
Fri Mar 21 08:34:43 CDT 2008


Runs 329 and 330 (both in the CI log dir) were run with (I hope) the 
requested Falkon logs turned on. Note that I turned the deef provider 
logs on too, but did not yet verify that it was correctly logging.

run329 was 9 jobs, no sync. All 9 succeed.
run330 was 25 jobs, no sync. 19 of 25 succeeded, the rest failed.

This is starting to confirm a curious pattern: without the sync, 
workflows with more jobs achieve *less* total sucessful jobs.
Here's what I recall from the last few days of testing:

    1 job wf: all succeeds
    9 job wf: al  succeed
   25 job wf: 15-20 succeed
  100 job wf: 1-2 succeed
1000 job wf: 0 succeed

I dont have enough data to confirm this, but the pattern seems to be 
present.

I am going to set the problem aside for now, until, Ben and Ioan, you 
have a chance to look at the logs from this morning's test.

I'll assume for the moment that the sync "fixes" the problem, and go on 
to the application tests I need to run, keeping an eye out for anomalies.

My goal is to to large-scale tests of AMIGA and DOCK under Swift, 
reducing wrapper.sh and throttling delays, and doing as much work on 
local RAM filesystems as possible.

Mike



On 3/21/08 7:12 AM, Michael Wilde wrote:
> My latest test on runs of 25, 100, and 1000 jobs seem to indicate that
> with a sync command at the end of the application script, all job status
> and data is returned ok every time.
> 
> (This is somewhat curious, as the info and success files fur the current
> would not yet be complete at the time, but the sync command effects all
> other activity on the host, and ensures that at least the currently
> existing dirs, files and data are synced, or that their sync has started).
> 
> Without the sync, at the moment, virtually all jobs fail, and almost
> *no* data is being returned.  Out of 3 runs of 1000 jobs, one run
> returned 2 data files, the other two returned no data files. One 100-job
> run without sync returned 11 of 100 files.
> 
> It seems like the most fruitful testing to see if this sync is totally
> fixing the problem is to do lots more runs.
> 
> I noted that the bblog host (from which I run Swift) has no special NFS
> mount flags, just rw. (I was wondering if they had something on that
> would affect coherence; seems not).
> 
> I did not have a chance to capture the falkon logs in these tests; I
> will look for the ones Ioan mentioned, and try some runs with those logs
> captured.
> 
> The swift logs I did capture are in the CI log dir, wilde/run{317-328}
> 
> run317/comment:amps1 100 sico with sync - ran ok
> run318/comment:amps1 100 with no sync - died on first error
> run319/comment:amps1 without sync - 11 of 100 returned OK
> run320/comment:amps1 100 without sync - no data returned ok
> run321/comment:amps1 100 without sync - no data returned ok
> run322/comment:amps1 100 with sync - all data returned ok
> run323/comment:amps1 100 with sync - all data returned ok
> run324/comment:amps1 1000 with sync - all data returned ok
> run325/comment:amps1 1000 without sync - no data returned ok
> run326/comment:amps1 25 without sync - no data returned ok
> run327/comment:amps1 100 without sync - 2 data files returned ok
> run328/comment:amps1 1000 with sync - all data returned ok
> 
> - Mike
> 
> On 3/20/08 6:23 PM, Ben Clifford wrote:
>> There is flag for NFS mounts, 'noac', which disables attribute caching 
>> on clients, which I think may make the fielsystem behave in the 
>> desired fashion; however it sounds like it also massively reduces 
>> filesystem performance and fileserver load.
>>
>> Mike, you might be able to persuade MCS systems to make such a 
>> filesystem available.
>>
>> I suspect some multi-second delay after touching the status file and 
>> before exiting in the wrapper script is probably the best workaround 
>> for now, though.
>>
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 



More information about the Swift-devel mailing list