[Swift-devel] Is there a site count limit?

Fri Apr 10 12:00:26 CDT 2009

They are in ci:/home/wilde/oops.1063.2

I spotted the anomaly (if thats what it is) as below.

Also: we discussed on the list way way back how to get the swift 
scheduler to send no more jobs to each "site" than there are cores in 
that site (for this bgp/falkon case) so that jobs dont get committed to 
busy sites while other sites have free cores.

In this run, we are trying to send 32K jobs to 32K cores.
Each of the 128 "sites" have 256 cores.

The #s below show about 19K of those jobs as having been dispatched to 
32*256 = 8192 cores.

int$ grep JOB_START *nr3.log | awk '{print $19}' | sort | uniq -c 

      24
     365 host=bgp000
     790 host=bgp001
     371 host=bgp002
     383 host=bgp003
     365 host=bgp004
     791 host=bgp005
     415 host=bgp006
     775 host=bgp007
     790 host=bgp008
     791 host=bgp009
     369 host=bgp010
     790 host=bgp011
     359 host=bgp012
     791 host=bgp013
     394 host=bgp014
     402 host=bgp015
     358 host=bgp016
     595 host=bgp017
     790 host=bgp018
     790 host=bgp019
     791 host=bgp020
     790 host=bgp021
     370 host=bgp022
     790 host=bgp023
     790 host=bgp024
     674 host=bgp025
     567 host=bgp026
     389 host=bgp027
     778 host=bgp028
     366 host=bgp029
     787 host=bgp030
     695 host=bgp031
int$ pwd

On 4/10/09 11:42 AM, Mihael Hategan wrote:
> On Fri, 2009-04-10 at 11:38 -0500, Michael Wilde wrote:
>> Hi,
>>
>> We're trying to run an oops run on 8 racks of the BGP. Its possible this 
>> is larger than has been done to date with swift.
>>
>> Our sites.xml file has localhost plus 128 Falkon sites, one for each 
>> pset in the 8-rack partition.
>>
>>  From what I can tell, Swift sees all 128 sites, but only sends jobs to 
>> exactly the first 32, bgp000-bgp031.
>>
>> While I debug this further, does anyone know of some hardwired limit 
>> that would cause swift to send to only the first 32 bgp sites?
> 
> I can't think of anything that would make that the case. The sites file
> and a log would be useful.
>