[Swift-devel] Re: Needs for site selection and job scheduling enhancements
Mihael Hategan
hategan at mcs.anl.gov
Thu Feb 3 11:13:31 CST 2011
On Wed, 2011-02-02 at 18:31 -0600, Allan Espinosa wrote:
> 2011/2/1 Mihael Hategan <hategan at mcs.anl.gov>:
> I wonder if site independence only works when your workflow is
> compute-intensive. What about a mechanism where you can checkpoint
> the site scores from other runs of a workflow? But that would be
> available for all the jobs in a site.
>
> I guess we could make a 1 site catalog per 1 app entry in the
> transformation catalog and do the 'hinting' at that level.
Or augment the site catalog to contain app-specific biases.
>
> >>
> > [...]
> >> And we might need a feature (perhaps a swift.properties setting) to
> >> tell Swift to defer initial scheduling decisions for N seconds or
> >> until J jobs have been queued by the script, so that a sufficiently
> >> large number of jobs are in the queue before scheduling decisions are
> >> made (probably delay for say a minute on a multi-hour script run).
> >
> > How would that help? Given that the scheduling is probabilistic, that
> > makes the distribution essentially the same whether you have N or N/2
> > jobs.
>
> Here is what I think the motivation for this feature: Given a
> workflow with jobs grouped into m. Each group has {n_1, n_2, n_3,
> ..., n_m} jobs. Each group has a common data {d_1, d_2, ..., d_m}.
>
> Then let us say that n_1 > n_2 > n_3 > ... > n_m . From here, we say
> that scheduling group m on multiple sites does not make sense
That's a strong statement. If that one site is busy enough that would
cause additional jobs to take longer without data staging than it would
take them to run on a different site with staging, then it would make
sense.
> since
> there is only a few jobs that share a data. it would be better to
> bundle the jobs in group m into a single site. I wonder how you can
> factor that in the probablistic scores.
Bias based on data locality. We had a student that did some preliminary
work there, but it never really made it in.
However I now see what Mike meant, and that is a windowing algorithm for
deciding that bias. But I don't think that's ultimately necessary. I
think a probabilistic approach would work ok without the need for a
delay.
>
> >>
> >> In addition, we're wondering how easy (and desirable) any/all of the
> >> following language extensions could be done:
> >>
> >> - select statement to work on string values and/or ranges
> >
> > What would be the semantics of this statement? Can you give examples?
> >>
> >> - elseif clause to achieve the above in a multi-branch if statement
> >
> > Quite silly we don't support that already.
>
> At least officially in the documentation, it says we don't support it.
I really have no idea whether this works or not, but if it doesn't it's
silly.
More information about the Swift-devel
mailing list