[Swift-devel] Re: Needs for site selection and job scheduling enhancements

Wed Feb 2 18:31:22 CST 2011

2011/2/1 Mihael Hategan <hategan at mcs.anl.gov>:
> I think we need to slow down a bit :)
>
> On Tue, 2011-02-01 at 08:30 -0600, wilde at mcs.anl.gov wrote:
>> Mihael,
>>
>> Below is a proposal for Swift scheduling features that will need a
>> fair amount of deliberation. This email is intended to start the
>> process. I can move this to a bugzila enhancement to start the
>> process, so long as you agree that the discussion makes sense to have.
>>
>> Allan, Dan and I have been re-examining the SCEC workflow that Allan
>> is working on.
>>
>> Doing it efficiently on OSG raises scheduling aspects that Swift still
>> doesn't handle well. We propose to address these issues in two phases:
>
> I think it may be useful to spell those out.
>>
>> I. Use simple workflows that group more work into single scripts to
>> achieve the job affinities needed for reasonable performance. Provide
>> scheduling hints to Swift.
>>
>> II. Determine how Swift could automatically achieve the same
>> scheduling decisions.
>>
>> Phase II is pretty complex as far as we can tell, so lets defer its
>> discussion.
>>
>> To do phase I, we want to ask if any of the following capabilities
>> could be added, and which ones are both reasonable and of "affordable"
>> cost and make sense to try.
>>
>> Most of these involve enabling a Swift script to specify scheduling
>> "hints" on individual app() invocations.
>
> Does the place where the hints are specified have any relevance? If we
> put the hints in the swift source we lose the "site independence"
> aspect.

I wonder if site independence only works when your workflow is
compute-intensive.  What about a mechanism where you can checkpoint
the site scores from other runs of a workflow?   But that would be
available for all the jobs in a site.

I guess we could make a 1 site catalog per 1 app entry in the
transformation catalog and do the 'hinting' at that level.

>>
> [...]
>> And we might need a feature (perhaps a swift.properties setting) to
>> tell Swift to defer initial scheduling decisions for N seconds or
>> until J jobs have been queued by the script, so that a sufficiently
>> large number of jobs are in the queue before scheduling decisions are
>> made (probably delay for say a minute on a multi-hour script run).
>
> How would that help? Given that the scheduling is probabilistic, that
> makes the distribution essentially the same whether you have N or N/2
> jobs.

Here is what I think the motivation for this feature:  Given a
workflow with jobs grouped into m.  Each group has {n_1, n_2, n_3,
..., n_m} jobs.  Each group has a common data {d_1, d_2, ..., d_m}.

Then let us say that n_1 > n_2 > n_3 > ... > n_m .  From here, we say
that scheduling group m on multiple sites does not make sense since
there is only a few jobs that share a data.  it would be better to
bundle the jobs in group m into a single site.  I wonder how you can
factor that in the probablistic scores.

>>
>> In addition, we're wondering how easy (and desirable) any/all of the
>> following language extensions could be done:
>>
>> - select statement to work on string values and/or ranges
>
> What would be the semantics of this statement? Can you give examples?
>>
>> - elseif clause to achieve the above in a multi-branch if statement
>
> Quite silly we don't support that already.

At least officially in the documentation, it says we don't support it.

>>
>> - function pointers to select a function dynamically, eg from an array
>
> Well, I do like the idea of higher order functions, but that's not quite
> the way we went with this in the start. Though I'm sure it could be
> added. However, I would be curious to see the kind of problem that one
> would solve with swift that would require this.
>
>> - ability to set the app program name from a variable
>
> Could you clarify that?
>>
>> These enhancements would enable us to manually code in the scheduling
>> hints by providing multiple pool groups with different throttle
>> settings and to manually force jobs to different pools.
>
> I feel that to be a contrived way to avoid java code. Things are
> separated into components in order to isolate solutions to subproblems
> into loosely connected parts of the code. The idea that we'd implement
> scheduling features in the swift language seems to be the antithesis of
> that design principle.
>>
>> If the easiest way to set the hints requested above on an individual
>> job is to pass an env var on the command line, then that capability
>> might be a useful alternative to setting env vars with
>> one-value-for-all method that we currently employ with the ENV
>> profile.  This could be considered as a useful enhancement separate
>> from the question of how scheduling hints are set.
>>
>> Lastly, in phase I we will be testing the performance of having the
>> jobs "pull" files via wget in a pre-staged manner, within the
>> applicaton script.  For Phase II we'd like to consider having Swift do
>> that in the worker: Have the coaster worker "pull" files in via wget
>> or similar command/function, asynchronously pre-staging files for jobs
>> that have been queued/assigned to a site. But that can be deferred for
>> a later discussion.
>
> How is that different from the current worker staging mechanism (aside
> from changing protocols and tools)? I.e., what is the theoretical
> difference?

>>
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>

-- 
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>