<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta content="MSHTML 6.00.6000.16809" name="GENERATOR">
<style title="owaParaStyle">P {
MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px
}
</style>
</head>
<body bgcolor="#ffffff" ocsi="x">
<div dir="ltr"><font face="Tahoma" color="#000000" size="2">Dear all,</font></div>
<div dir="ltr"><font face="tahoma" size="2"></font> </div>
<div dir="ltr"><font face="tahoma" size="2">I came from the angle of application (such as Enterprise application or disaster recovery for an extreme case) requirement in SLA. Reservation can give some idea of response time (let's talk separately about failure
and inaccurate execution time estimation) while queuing time prediction can give some probability of (mean and upbound of) expected start time. Knowing queuing time is important according to feedbacks from users of our in-hours supercomputers while they bare </font><font face="tahoma" size="2">errors
in their execution time estimations. </font><font face="tahoma" size="2">However, even for a single task, only dynamically queuing it (or a number of its replicas) does not provide time-related information (as have been mentioned).
</font></div>
<div dir="ltr"><font face="tahoma" size="2"></font> </div>
<div dir="ltr"><font face="tahoma" size="2"><font face="tahoma" size="2"><font face="tahoma" size="2"><font face="tahoma" size="2">Ioan</font></font></font>, I was also thinking along the line of queue time estimation, which may be sufficient for what I am
doing now. I considered reservation (so no queuing time) in my previous fault tolerance work due to the strict timing sequence requirement. I will read the paper soon to clarify a few points, especially the two points made by Mihael. Because to me it is only
useful if it can tell (a) a queuing time, not only for the current state and immediately changes when new jobs are queued; (b) a mean and an upbound on queuing time, or if only the upbound is given, it should be tight in some sense (at most 20 minutes for the
2-minute job example). Finally, when a node can fail, it can also affect jobs queuing for it and this paper briefly mentions something about detecting the failure using queuing time data.
</font></div>
<div dir="ltr"><font face="tahoma" size="2"></font> </div>
<div dir="ltr"><font face="tahoma" size="2">I will share my findings regarding queuing time with you guys soon.</font></div>
<div dir="ltr"> </div>
<div dir="ltr"><font face="tahoma" size="2">Cheers,</font></div>
<div dir="ltr"><font face="tahoma" size="2">Qin Zheng</font></div>
<div dir="ltr"><font face="tahoma" size="2"> </font></div>
<div id="divRpF572886" style="DIRECTION: ltr">
<hr tabindex="-1">
<font face="Tahoma" size="2"><b>From:</b> Ioan Raicu [iraicu@cs.uchicago.edu]<br>
<b>Sent:</b> Thursday, April 09, 2009 4:38 AM<br>
<b>To:</b> Ben Clifford<br>
<b>Cc:</b> Mihael Hategan; Qin Zheng; swift-devel; Ian Foster<br>
<b>Subject:</b> Re: [Swift-devel] Re: replication vs site score<br>
</font><br>
</div>
<div></div>
<div>Does a batch-queue prediction service help things in any way?<br>
<a class="moz-txt-link-freetext" href="https://portal.teragrid.org/gridsphere/gridsphere?cid=queue-prediction" target="_blank">https://portal.teragrid.org/gridsphere/gridsphere?cid=queue-prediction</a><br>
<br>
I've always wondered how the Swift scheduler would behave differently if it had statistical information about queue times. Qin, have you compared your job replication strategy with one that was cognizant of the expected wait queue time, in order to meet deadlines?
On the surface, assuming that the batch queue prediction is accurate, it would seem that scheduling with known queue times might solve the same deadline cognizant scheduling problem, but without wasting resources by unnecessary replication. The place where
the queue prediction doesn't help, is when there is a bad node which causes an application to be slow or fail. In this case, replication is probably the better recourse to guarantee meeting deadlines.<br>
<br>
Here is their latest paper on this: <a class="moz-txt-link-freetext" href="http://www.springerlink.com/content/7552901360631246/fulltext.pdf" target="_blank">
http://www.springerlink.com/content/7552901360631246/fulltext.pdf</a>. The system is deployed on the TeraGrid, and has been for a few years now. As far as I have heard, it is quite robust and accurate.<br>
<br>
Cheers,<br>
Ioan<br>
<br>
Ben Clifford wrote:
<blockquote type="cite">
<pre>On Wed, 8 Apr 2009, Mihael Hategan wrote:
This:
</pre>
<blockquote type="cite">
<pre>planning the whole workflow buys us little in a (very) dynamic
environment in which submitting a job one minute later may mean the
difference between 1 minute of queue time and one hour of queue time
</pre>
</blockquote>
<pre>and this:
</pre>
<blockquote type="cite">
<pre>You need some SLA/QOS to address that.
</pre>
</blockquote>
<pre>seem to be significant characteristics that make the environments we run
on not amenable to scheduling in the traditional sense. The lack of any
meaningful guarantees about almost anything time-related makes everything
basically opportunistic rather than scheduled.
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
===================================================
Ioan Raicu, Ph.D.
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: <a class="moz-txt-link-abbreviated" href="mailto:iraicu@cs.uchicago.edu">iraicu@cs.uchicago.edu</a>
Web: <a class="moz-txt-link-freetext" href="http://www.cs.uchicago.edu/~iraicu" target="_blank">http://www.cs.uchicago.edu/~iraicu</a>
<a class="moz-txt-link-freetext" href="http://dev.globus.org/wiki/Incubator/Falkon" target="_blank">http://dev.globus.org/wiki/Incubator/Falkon</a>
<a class="moz-txt-link-freetext" href="http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page" target="_blank">http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page</a>
===================================================
===================================================
</pre>
</div>
<br>
<hr>
<font face="Arial" color="Gray" size="2">This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other
person. Thank you.<br>
</font>
</body>
</html>