[Swift-devel] Re: scheduling

Vipul Kumar Singh vipulkrsingh at gmail.com
Tue Mar 30 17:13:52 CDT 2010


Hi,

Going through the code i realized that we could add to the
checkConstraints() which returns the desired sites to the
getNextResource()... We can add data dependencies as task constraints, the
constraints checker then checks the availability of data on sites based on
catalog, as you suggested. So the logical step would be to have the support
for RLS catalog in swift so it can be used to make scheduling decisions.

But I am having trouble figuring out how to get the mappers and catalog
concept work complementary to each other i.e. what would be first steps
towards having a catalog of data in swift..

Thank you

Vipul Kumar Singh




On Sat, Mar 20, 2010 at 8:13 AM, <wilde at mcs.anl.gov> wrote:

> Hi Vipul,
>
> The topic you propose here is both interesting and likely to be valuable
> for Swift users, and I'd like to help you develop the ideas further.  I'll
> try to respond to later comments on the discuss that started on your
> proposal, but first I'd like to offer a few thoughts on your initial message
> below.
>
> ----- "Vipul Kumar Singh" <vipulkrsingh at gmail.com> wrote:
>
> > Sir,
> >
> > I am interested in working on the data-aware features in swift on two
> > major points
> >
> > 1) For jobs dependent on same data, scheduler tries to schedule them
> > to same resources.
>
> The ability to *consider* the location of a data object when assigning a
> job to a site would be great.
>
> >
> > (i) The scheduler maintains a hash containing information about (a)
> > data files, (b) jobs dependent on that data and (c) resources that are
> > executing (or scheduled to execute) those jobs.
> > (ii) The information will be updated every time a job is finished (on
> > success/failure).
> > (iii) When scheduling new jobs, the scheduler looks through the hash
> > and schedules new job to the resource that has data on which new job
> > is dependent (If the resource is not already overloaded).
>
> I think these heuristics are a very good first approximation; you'll need
> to refine the criteria as you get deeper into the code and also based on
> experiments and measurements.
>
> It would be good to base a solution on a grid-wide file location catalog
> like the Globus RLS or similar. Ive long thought that file location could
> also be selected and/or influenced by mappers (like ext mapper scripts) but
> thats just a starting point to feed info to the site selection algorithm.
>
> >
> > 2) Combining the tasks together that are dependent on same data files,
> > before scheduling. And i believe this can be added quickly using
> > coasters ( or is the feature already there... ).
>
> I think a bit of work has been done towards this goal by a student a few
> summers back that could be used as a starting point.  In general, we've been
> interested in having Swift join tasks together in a shell-like pipeline when
> that would be optimal.
>
> >
> > Currently i have setup some machines with globus and swift but am not
> > able to get swift submit jobs through gram4.....
>
> I think you'll get better results (and support) with GRAM 5, or even GRAM
> 2.4. And you could do initial experiments with just ssh and coasters. Let us
> know what problems youre havig (on swift-user at ci.uchicago.edu, and include
> your sites.xml, tc.data, swift.properties, and swift .log file when
> reporting problems, so we can diagnose them with less back-and-forth, OK?
>
> I'm eager to see what kind of progress you can make on this problem, and
> very willing to help you think through it.
>
> Regards,
>
> Mike
>
>
>
> >
> > Vipul Kumar Singh
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20100331/52643bf0/attachment.html>


More information about the Swift-devel mailing list