[Swift-devel] Task matching queue scalability benchmark

Tim Armstrong tim.g.armstrong at gmail.com
Fri Mar 20 10:54:08 CDT 2015


Hi All,
 I just wanted to share some benchmark results I did that may be of
interest to the team.  One thing Justin and I have been working on for
quite a while is improving the core ADLB data structures that match up
tasks to workers.  The efficiency of these affects task throughput in
Swift/T a lot, and the scalability as a lot of tasks get enqueued can make
a big difference on many workloads.

It's actually pretty complex getting the data structures right, since we
need to support task priorities plus task types plus various kinds of
locality, plus matching in both directions (incoming tasks to idle workers,
incoming worker requests to tasks), plus we want very high efficiency -  a
few hundred thousand tasks per second per server means a few microseconds
per match.  Justin did the initlal rewrite in 2012 (I think) to replace the
linked lists originally used (which don't scale well).  We've gone back and
forth since then making further improvements.  Recently I made some further
changes for locality-aware matching that resulted in the current
performance results in the benchmark.

The short version is: everything performs great (100s of nanoseconds per
task) even with 1 million tasks in the queue and using all locality-aware
matching features.

I did experiments to quantify the results of all that work.  The experiment
simulates incoming tasks and requests for tasks and times how long the
operations on the data structures take (we don't include MPI and other
overheads, which add 100%+ overhead in practice).

The experiment results are divided up in a few ways:

EQUAL/UNIFORM_RANDOM - whether priorities are all equal or are randomly
chosen
UNTARGETED/TARGETED - whether it can go to any worker, or a specified
subset of workers
RANK/NODE - whether a rank or all ranks on a node are targeted
SOFT - if the task can go to a different rank if the targeted one is busy.

On the x axis we have the queue size - how many tasks there are in the
queue.  On the y axis is the average time per operation in nanoseconds.  It
requires two operations to match a task, so double that time to match a task

Overall everything is very efficient - worst case is 0.3 of a microsecond,
most are 50 to 100 nanoseconds.  Equal priorities show performance
independent of queue size, while random priorities scales logarithmically
and perform well even with a million tasks in the queue.  There's some
constant overhead associated with node targeting and soft targeting since
the logic is slightly more complicated.

- Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20150320/57b62725/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: q-scalability-beagle.png
Type: image/png
Size: 78742 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20150320/57b62725/attachment.png>


More information about the Swift-devel mailing list