<div dir="ltr"><div><div><div><div><div>Hi All,<br></div> I just wanted to share some benchmark results I did that may be of interest to the team. One thing Justin and I have been working on for quite a while is improving the core ADLB data structures that match up tasks to workers. The efficiency of these affects task throughput in Swift/T a lot, and the scalability as a lot of tasks get enqueued can make a big difference on many workloads.<br><br>It's actually pretty complex getting the data structures right,
since we need to support task priorities plus task types plus various
kinds of locality, plus matching in both directions (incoming tasks to
idle workers, incoming worker requests to tasks), plus we want very high
efficiency - a few hundred thousand tasks per second per server means a
few microseconds per match. Justin did the initlal rewrite in 2012 (I
think) to replace the linked lists originally used (which don't scale well). We've gone back and forth since then making
further improvements. Recently I made some further changes for locality-aware matching that resulted in the current performance results in the
benchmark.<br><br></div><div>The short version is: everything performs great (100s of nanoseconds per task) even with 1 million tasks in the queue and using all locality-aware matching features.<br><br></div><div>I did experiments to quantify the results of all that work. The experiment simulates incoming tasks and requests for tasks and times how long the operations on the data structures take (we don't include MPI and other overheads, which add 100%+ overhead in practice).<br></div><div><br></div>The experiment results are divided up in a few ways:<br><br></div>EQUAL/UNIFORM_RANDOM - whether priorities are all equal or are randomly chosen<br>UNTARGETED/TARGETED - whether it can go to any worker, or a specified subset of workers<br></div>RANK/NODE - whether a rank or all ranks on a node are targeted<br></div><div>SOFT - if the task can go to a different rank if the targeted one is busy.<br><br></div><div>On the x axis we have the queue size - how many tasks there are in the queue. On the y axis is the average time per operation in nanoseconds. It requires two operations to match a task, so double that time to match a task<br><br></div><div>Overall everything is very efficient - worst case is 0.3 of a microsecond, most are 50 to 100 nanoseconds. Equal priorities show performance independent of queue size, while random priorities scales logarithmically and perform well even with a million tasks in the queue. There's some constant overhead associated with node targeting and soft targeting since the logic is slightly more complicated.<br><br></div><div><div><div><div>- Tim<br></div></div></div></div></div>