Hi,<br><br>Going through the code i realized that we could add to the checkConstraints() which returns the desired sites to the getNextResource()... We can add data dependencies as task constraints, the constraints checker then checks the availability of data on sites based on catalog, as you suggested. So the logical step would be to have the support for RLS catalog in swift so it can be used to make scheduling decisions.<br>
<br>But I am having trouble figuring out how to get the mappers and catalog concept work complementary to each other i.e. what would be first steps towards having a catalog of data in swift..<br><br>Thank you<br><br>Vipul Kumar Singh<br>
<br><br><br><br><div class="gmail_quote">On Sat, Mar 20, 2010 at 8:13 AM, <span dir="ltr"><<a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Hi Vipul,<br>
<br>
The topic you propose here is both interesting and likely to be valuable for Swift users, and I'd like to help you develop the ideas further. I'll try to respond to later comments on the discuss that started on your proposal, but first I'd like to offer a few thoughts on your initial message below.<br>
<div class="im"><br>
----- "Vipul Kumar Singh" <<a href="mailto:vipulkrsingh@gmail.com">vipulkrsingh@gmail.com</a>> wrote:<br>
<br>
</div><div class="im">> Sir,<br>
><br>
> I am interested in working on the data-aware features in swift on two<br>
> major points<br>
><br>
> 1) For jobs dependent on same data, scheduler tries to schedule them<br>
> to same resources.<br>
<br>
</div>The ability to *consider* the location of a data object when assigning a job to a site would be great.<br>
<div class="im"><br>
><br>
> (i) The scheduler maintains a hash containing information about (a)<br>
> data files, (b) jobs dependent on that data and (c) resources that are<br>
> executing (or scheduled to execute) those jobs.<br>
> (ii) The information will be updated every time a job is finished (on<br>
> success/failure).<br>
</div><div class="im">> (iii) When scheduling new jobs, the scheduler looks through the hash<br>
> and schedules new job to the resource that has data on which new job<br>
> is dependent (If the resource is not already overloaded).<br>
<br>
</div>I think these heuristics are a very good first approximation; you'll need to refine the criteria as you get deeper into the code and also based on experiments and measurements.<br>
<br>
It would be good to base a solution on a grid-wide file location catalog like the Globus RLS or similar. Ive long thought that file location could also be selected and/or influenced by mappers (like ext mapper scripts) but thats just a starting point to feed info to the site selection algorithm.<br>
<div class="im"><br>
><br>
> 2) Combining the tasks together that are dependent on same data files,<br>
> before scheduling. And i believe this can be added quickly using<br>
> coasters ( or is the feature already there... ).<br>
<br>
</div>I think a bit of work has been done towards this goal by a student a few summers back that could be used as a starting point. In general, we've been interested in having Swift join tasks together in a shell-like pipeline when that would be optimal.<br>
<div class="im"><br>
><br>
> Currently i have setup some machines with globus and swift but am not<br>
> able to get swift submit jobs through gram4.....<br>
<br>
</div>I think you'll get better results (and support) with GRAM 5, or even GRAM 2.4. And you could do initial experiments with just ssh and coasters. Let us know what problems youre havig (on <a href="mailto:swift-user@ci.uchicago.edu">swift-user@ci.uchicago.edu</a>, and include your sites.xml, tc.data, swift.properties, and swift .log file when reporting problems, so we can diagnose them with less back-and-forth, OK?<br>
<br>
I'm eager to see what kind of progress you can make on this problem, and very willing to help you think through it.<br>
<br>
Regards,<br>
<br>
Mike<br>
<br>
<br>
<br>
><br>
> Vipul Kumar Singh<br>
<div><div></div><div class="h5"><br>
--<br>
Michael Wilde<br>
Computation Institute, University of Chicago<br>
Mathematics and Computer Science Division<br>
Argonne National Laboratory<br>
<br>
</div></div></blockquote></div><br>