[Swift-devel] fakecnari on ranger without gridftp

Sun Sep 28 22:00:51 CDT 2008

I see the following stats for this run:

Total number of events: 10002
Shortest event (s): 3.4300000667572
Longest event (s): 753.21799993515
Total duration of all events (s): 53898.3449883461
Mean event duration (s): 5.38875674748511
Standard deviation of event duration (s): 7.48318471316593
Maximum number of events at one time: 113

What inherently limits the run to 113 events at a time?  Is it the fact 
that Coaster only allocated 113 (maybe a few more) CPU-cores?  How many 
CPU-cores did coaster allocate?  With 113 CPU-cores and 5.38 sec tasks, 
that means a throughput of ~21 tasks/sec.  Is this the bottleneck?  Its 
probably not the file system (in terms of the app accessing the 
input/output data), as if it were, task execution times would simply 
increase with load... but it could be the file system being slow in 
getting the input data in the right place for the app to start 
computing, as in staging it in.

BTW, do the times above include wait queue times?  I see the longest 
task being 753 sec, but below you say that the workload takes 590 sec 
not including queue time.  Do you have a plot of the number of CPUs in 
relation to the number of active tasks?  Are all available CPUs kept 
busy?  The speedup is one story, 84X out of 113X possible (this 113X 
should really be the number of CPU-cores), but sometimes the workload 
characteristics limit the maximum possible speedup... and in that case, 
its good to look at the CPU-core utilization.  Is it possible to draw a 
graph that has this info?  # of CPU-cores, number of active tasks, and 
throughput of completed tasks?

Ioan

Ben Clifford wrote:
> I have an app 'fakecnari' that behaves somewhat like the CNARI app that 
> skenny has been working on, in order to make it easier for me to look at 
> bottlenecks.
>
> So far, that's had similar problems to skenny's real runs where input 
> files cannot be staged in to ranger fast enough from UC - this limits the 
> number of cores that can be used at any one time on Ranger to around 15.
>
> So I thought in order to see what other bottlenecks might be found, I'd 
> make a run with swift running directly on a ranger headnode, submitting 
> through coasters and with the input and output files moved around using 
> the local copy file provider (the same as happens when you use the default 
> local site).
>
> This looks like it manages to use over 100 cores quite a lot. The speedup 
> for the run is (including allocation time for coaster workers, which is 
> a significant part of this run) about 50000s worth of sleep done in 800s, 
> which is sleeping 62 times as fast as on a single core. Discounting 
> worker allocation time, this takes about 590s which is sleeping about 84 
> times as fast.
>
> Even with local copies instead of ftp, file transfers (limited to 4 at 
> once) appear to be a rate limiting factor.
>
> There are full plots here:
>
> http://www.ci.uchicago.edu/~benc/tmp/report-fakecnari-20080928-1134-herl17vf/
>
>   

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================