[Swift-devel] Throttling issues for coasters on OSG

Michael Wilde wilde at mcs.anl.gov
Thu Oct 21 12:24:38 CDT 2010


Mihael, All, can you provide some guidance on this issue:

Allan is starting coasters "out of band" on OSG, using passive-persistent servers.

He's also setting the throttles for each site manually in sites.xml, somewhat proportional to each site's cpu count.

Not surprisingly (in hindsight) what he's seeing suggests that this approach is not good: it causes Swift to commit jobs to sites where all the running coasters are already busy, while leaving many running coasters on other sites idle.

Allan will now test with the throttles removed to see if the default scheduling algorithm will do a better job. We expect that it will.

But we'd like to explore 2 changes to make the scheduler perform optimally, and see (a) what you think of them and (b) if desirable, how t implement them

Change 1: bias scheduling based on job start rather than job completion. We have discussed this many times in the past, and it seems more optimal in all cases I can imagine. Do you agree? Can you point us to code modules where we can try this and evaluate it?

Change 2: for coasters, use knowledge of how many worker slots are registered on each site to always keep the throttle set to exactly this value, and hence to ensure that, given sufficient ready tasks, all coaster slots are always filled?

I think that for Allan's workload (many hundred thousand 5 minute tasks), change 2 will be most effective and in almost all cases desirable.

A third issue relates to staging, and is perhaps a non-issue if change 2 works. The issue is: if we use provider staging for this workload, which is likely desirable, will data staging be done at the time a worker pulls a job, or will it be done in advance of that, and asynchronously with job start, just as it would be with ordinary staging?

Thanks,

- Mike




More information about the Swift-devel mailing list