[Swift-devel] Re: Needs for site selection and job scheduling enhancements
Mihael Hategan
hategan at mcs.anl.gov
Tue Feb 1 17:13:16 CST 2011
I think we need to slow down a bit :)
On Tue, 2011-02-01 at 08:30 -0600, wilde at mcs.anl.gov wrote:
> Mihael,
>
> Below is a proposal for Swift scheduling features that will need a
> fair amount of deliberation. This email is intended to start the
> process. I can move this to a bugzila enhancement to start the
> process, so long as you agree that the discussion makes sense to have.
>
> Allan, Dan and I have been re-examining the SCEC workflow that Allan
> is working on.
>
> Doing it efficiently on OSG raises scheduling aspects that Swift still
> doesn't handle well. We propose to address these issues in two phases:
I think it may be useful to spell those out.
>
> I. Use simple workflows that group more work into single scripts to
> achieve the job affinities needed for reasonable performance. Provide
> scheduling hints to Swift.
>
> II. Determine how Swift could automatically achieve the same
> scheduling decisions.
>
> Phase II is pretty complex as far as we can tell, so lets defer its
> discussion.
>
> To do phase I, we want to ask if any of the following capabilities
> could be added, and which ones are both reasonable and of "affordable"
> cost and make sense to try.
>
> Most of these involve enabling a Swift script to specify scheduling
> "hints" on individual app() invocations.
Does the place where the hints are specified have any relevance? If we
put the hints in the swift source we lose the "site independence"
aspect.
>
[...]
> And we might need a feature (perhaps a swift.properties setting) to
> tell Swift to defer initial scheduling decisions for N seconds or
> until J jobs have been queued by the script, so that a sufficiently
> large number of jobs are in the queue before scheduling decisions are
> made (probably delay for say a minute on a multi-hour script run).
How would that help? Given that the scheduling is probabilistic, that
makes the distribution essentially the same whether you have N or N/2
jobs.
>
> In addition, we're wondering how easy (and desirable) any/all of the
> following language extensions could be done:
>
> - select statement to work on string values and/or ranges
What would be the semantics of this statement? Can you give examples?
>
> - elseif clause to achieve the above in a multi-branch if statement
Quite silly we don't support that already.
>
> - function pointers to select a function dynamically, eg from an array
Well, I do like the idea of higher order functions, but that's not quite
the way we went with this in the start. Though I'm sure it could be
added. However, I would be curious to see the kind of problem that one
would solve with swift that would require this.
> - ability to set the app program name from a variable
Could you clarify that?
>
> These enhancements would enable us to manually code in the scheduling
> hints by providing multiple pool groups with different throttle
> settings and to manually force jobs to different pools.
I feel that to be a contrived way to avoid java code. Things are
separated into components in order to isolate solutions to subproblems
into loosely connected parts of the code. The idea that we'd implement
scheduling features in the swift language seems to be the antithesis of
that design principle.
>
> If the easiest way to set the hints requested above on an individual
> job is to pass an env var on the command line, then that capability
> might be a useful alternative to setting env vars with
> one-value-for-all method that we currently employ with the ENV
> profile. This could be considered as a useful enhancement separate
> from the question of how scheduling hints are set.
>
> Lastly, in phase I we will be testing the performance of having the
> jobs "pull" files via wget in a pre-staged manner, within the
> applicaton script. For Phase II we'd like to consider having Swift do
> that in the worker: Have the coaster worker "pull" files in via wget
> or similar command/function, asynchronously pre-staging files for jobs
> that have been queued/assigned to a site. But that can be deferred for
> a later discussion.
How is that different from the current worker staging mechanism (aside
from changing protocols and tools)? I.e., what is the theoretical
difference?
>
More information about the Swift-devel
mailing list