[Swift-devel] Try coaster on BG/P ?

Ioan Raicu iraicu at cs.uchicago.edu
Thu Jun 19 19:58:43 CDT 2008


I am not sure of what problem you are referring to fix?

The issue with Falkon, is that there are queues at the service.  If a 
client submits all its jobs to a single service (that only manages 256 
CPUs), there could be 639 other services with 160K - 256 CPUs that are 
left idle (worst case, which wouldn't happen very often, but could still 
happen towards the ends of runs when there isn't enough work to keep 
everyone busy).  There are only 2 solutions. 

1) never queue anything up at the services, only send tasks from the 
client to a service when we know there is an available CPU to run that 
task; this is the approach we took
2) allow tasks to timeout after some time, and trigger a resubmit of the 
same task to another service, and keep doing this until a reply to that 
task comes back; this seems that it would introduce unnecessarily long 
delays, and cause load imbalances towards the end of runs when there 
isn't enough work to keep all busy

In essence, there is no problem to solve here, its just what solution 
you take, in such a distributed tree like environment, where you have 1 
client, N services, and M workers.  N is a value between 1 and 640, and 
M could be as high as 160K, with a ratio of 1:256 between N:M. 

Ioan

Mihael Hategan wrote:
> On Thu, 2008-06-19 at 18:56 -0500, Michael Wilde wrote:
>
>   
>> What Ioan did in Falkon when he went to the multiple-server architecture 
>> is relevant here: the client load-shares among all the servers, 
>> round-robin, only sending a job to a server when it knows that the 
>> server has a free cpu slot. In this way, no queues build up on the 
>> servers, and it avoids having a job wait in any server's queue when a 
>> free cpu might be available on some other server.
>>
>>     
>
> If you have O(1) scheduling, this shouldn't be necessary. It's like
> i2u2: Don't build a cluster to reduce the odds of triggering a problem.
> Fix the problem instead.
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20080619/644940ec/attachment.html>


More information about the Swift-devel mailing list