[Swift-devel] Is there a site count limit?
Mihael Hategan
hategan at mcs.anl.gov
Fri Apr 10 12:05:57 CDT 2009
On Fri, 2009-04-10 at 12:00 -0500, Michael Wilde wrote:
> They are in ci:/home/wilde/oops.1063.2
>
> I spotted the anomaly (if thats what it is) as below.
>
> Also: we discussed on the list way way back how to get the swift
> scheduler to send no more jobs to each "site" than there are cores in
> that site (for this bgp/falkon case) so that jobs dont get committed to
> busy sites while other sites have free cores.
>
> In this run, we are trying to send 32K jobs to 32K cores.
> Each of the 128 "sites" have 256 cores.
>
> The #s below show about 19K of those jobs as having been dispatched to
> 32*256 = 8192 cores.
That is if all the cores are the same. In this case it seems that only
8192 cores are the same. I'll investigate why.
>
> int$ grep JOB_START *nr3.log | awk '{print $19}' | sort | uniq -c
>
> 24
> 365 host=bgp000
> 790 host=bgp001
> 371 host=bgp002
> 383 host=bgp003
> 365 host=bgp004
> 791 host=bgp005
> 415 host=bgp006
> 775 host=bgp007
> 790 host=bgp008
> 791 host=bgp009
> 369 host=bgp010
> 790 host=bgp011
> 359 host=bgp012
> 791 host=bgp013
> 394 host=bgp014
> 402 host=bgp015
> 358 host=bgp016
> 595 host=bgp017
> 790 host=bgp018
> 790 host=bgp019
> 791 host=bgp020
> 790 host=bgp021
> 370 host=bgp022
> 790 host=bgp023
> 790 host=bgp024
> 674 host=bgp025
> 567 host=bgp026
> 389 host=bgp027
> 778 host=bgp028
> 366 host=bgp029
> 787 host=bgp030
> 695 host=bgp031
> int$ pwd
>
>
> On 4/10/09 11:42 AM, Mihael Hategan wrote:
> > On Fri, 2009-04-10 at 11:38 -0500, Michael Wilde wrote:
> >> Hi,
> >>
> >> We're trying to run an oops run on 8 racks of the BGP. Its possible this
> >> is larger than has been done to date with swift.
> >>
> >> Our sites.xml file has localhost plus 128 Falkon sites, one for each
> >> pset in the 8-rack partition.
> >>
> >> From what I can tell, Swift sees all 128 sites, but only sends jobs to
> >> exactly the first 32, bgp000-bgp031.
> >>
> >> While I debug this further, does anyone know of some hardwired limit
> >> that would cause swift to send to only the first 32 bgp sites?
> >
> > I can't think of anything that would make that the case. The sites file
> > and a log would be useful.
> >
More information about the Swift-devel
mailing list