<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
I am not sure of what problem you are referring to fix?<br>
<br>
The issue with Falkon, is that there are queues at the service. If a
client submits all its jobs to a single service (that only manages 256
CPUs), there could be 639 other services with 160K - 256 CPUs that are
left idle (worst case, which wouldn't happen very often, but could
still happen towards the ends of runs when there isn't enough work to
keep everyone busy). There are only 2 solutions. <br>
<br>
1) never queue anything up at the services, only send tasks from the
client to a service when we know there is an available CPU to run that
task; this is the approach we took<br>
2) allow tasks to timeout after some time, and trigger a resubmit of
the same task to another service, and keep doing this until a reply to
that task comes back; this seems that it would introduce unnecessarily
long delays, and cause load imbalances towards the end of runs when
there isn't enough work to keep all busy<br>
<br>
In essence, there is no problem to solve here, its just what solution
you take, in such a distributed tree like environment, where you have 1
client, N services, and M workers. N is a value between 1 and 640, and
M could be as high as 160K, with a ratio of 1:256 between N:M. <br>
<br>
Ioan<br>
<br>
Mihael Hategan wrote:
<blockquote cite="mid:1213920614.29014.12.camel@localhost" type="cite">
<pre wrap="">On Thu, 2008-06-19 at 18:56 -0500, Michael Wilde wrote:
</pre>
<blockquote type="cite">
<pre wrap="">What Ioan did in Falkon when he went to the multiple-server architecture
is relevant here: the client load-shares among all the servers,
round-robin, only sending a job to a server when it knows that the
server has a free cpu slot. In this way, no queues build up on the
servers, and it avoids having a job wait in any server's queue when a
free cpu might be available on some other server.
</pre>
</blockquote>
<pre wrap=""><!---->
If you have O(1) scheduling, this shouldn't be necessary. It's like
i2u2: Don't build a cluster to reduce the odds of triggering a problem.
Fix the problem instead.
_______________________________________________
Swift-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a>
<a class="moz-txt-link-freetext" href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a>
</pre>
</blockquote>
</body>
</html>