[Swift-devel] Try coaster on BG/P ?
Ioan Raicu
iraicu at cs.uchicago.edu
Thu Jun 19 19:58:43 CDT 2008
I am not sure of what problem you are referring to fix?
The issue with Falkon, is that there are queues at the service. If a
client submits all its jobs to a single service (that only manages 256
CPUs), there could be 639 other services with 160K - 256 CPUs that are
left idle (worst case, which wouldn't happen very often, but could still
happen towards the ends of runs when there isn't enough work to keep
everyone busy). There are only 2 solutions.
1) never queue anything up at the services, only send tasks from the
client to a service when we know there is an available CPU to run that
task; this is the approach we took
2) allow tasks to timeout after some time, and trigger a resubmit of the
same task to another service, and keep doing this until a reply to that
task comes back; this seems that it would introduce unnecessarily long
delays, and cause load imbalances towards the end of runs when there
isn't enough work to keep all busy
In essence, there is no problem to solve here, its just what solution
you take, in such a distributed tree like environment, where you have 1
client, N services, and M workers. N is a value between 1 and 640, and
M could be as high as 160K, with a ratio of 1:256 between N:M.
Ioan
Mihael Hategan wrote:
> On Thu, 2008-06-19 at 18:56 -0500, Michael Wilde wrote:
>
>
>> What Ioan did in Falkon when he went to the multiple-server architecture
>> is relevant here: the client load-shares among all the servers,
>> round-robin, only sending a job to a server when it knows that the
>> server has a free cpu slot. In this way, no queues build up on the
>> servers, and it avoids having a job wait in any server's queue when a
>> free cpu might be available on some other server.
>>
>>
>
> If you have O(1) scheduling, this shouldn't be necessary. It's like
> i2u2: Don't build a cluster to reduce the odds of triggering a problem.
> Fix the problem instead.
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20080619/644940ec/attachment.html>
More information about the Swift-devel
mailing list