[Swift-devel] running 1 billion tasks through Falkon

Ioan Raicu iraicu at cs.uchicago.edu
Thu Oct 16 14:09:02 CDT 2008


Hi again,
Up to a few days ago, the largest test I ever did with Falkon, in terms 
of number of tasks, was 20 million tasks, which worked as expected. For 
the sake of pushing Falkon, perhaps to the point where it might break, I 
tried the 20M task experiment again, but now with 1B (billion) tasks. 
Note, this 1 billion tasks is from a single invocation of the Falkon 
command line client.

On an orthogonal issue, I noticed that on simple sleep 0 tasks, I can't 
seem to saturate my 8 CPU cores where I run the service, and usually, I 
get 1~2 cores utilized. So, I decided to run 4 Falkon services on the 
same machine, and use the load-balancing client (designed for the BG/P) 
and ran 1B tasks (from another node with dual CPUs) across these 4 
services (running on the 8-core node).  Each service managed 32 CPUs in 
ANL/UC TG cluster, for a total of 128 CPUs, and each task was a simple 
sleep 0, with no I/O.

Here is the plot of the run.
http://people.cs.uchicago.edu/~iraicu/projects/Falkon/plots/Falkon-1B-4serv-128cpu.jpg

The good news is that the test ran great, getting an average of 15558 
tasks/sec. Now, the bad news is that the throughput seemed to drop from 
17000 tasks/sec (at the beginning) to 15500 tasks/sec (at the end). The 
explanation to the drop in throughput come from the memory management in 
Java on the client side, which apparently was spending 5~10 seconds in 
garbage collection every 60 seconds or so, and the amount of free heap 
space was monotonically decreasing, on average.  See the graph 
(http://people.cs.uchicago.edu/~iraicu/projects/Falkon/plots/Falkon-1B-4serv-128cpu-mem.jpg), 
which shows the free heap size decreased down to around 200MB (from the 
max of 1536MB). At the current rate of the potential memory leak I have 
in the client, the free heap would get diminished to 0MB by 1.5 billion 
tasks.

I just thought this was an interesting experiment, which revealed a 
memory leak in the client, that did not show up in the smaller tests I 
had done so far.

Cheers,
Ioan

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================





More information about the Swift-devel mailing list