[Swift-devel] log plots of start/end times

Ben Clifford benc at hawaga.org.uk
Wed Oct 8 08:00:26 CDT 2008


On Mon, 6 Oct 2008, Ben Clifford wrote:

> In interactions with a couple of people in the past month I've had concern 
> about lack of correlation (specifically how much can you rely on the first 
> above to imply the second?) so I've added a couple of plots to the 
> standard swift-lot-log plots that the log-processing module makes. These 
> appear at the bottom on the info page on plots.

I did a run of 3000 touch jobs against the UC teragrid site using gram4. 
The plots mentioned above are the last two on this page:

http://www.ci.uchicago.edu/~benc/report-066-many-20081008-0620-tdnpx947/info.html

The difference in time on the client side for active and completion status 
changes differs from the worker node by some amount - a minute or so once 
a large number of jobs are going through.

For the purposes of estimating jobs in progress, a simple delay on 
notification delivery shouldn't matter too much. What is more interesting 
in that respect is that there is a larger delay for active completions 
than for starts. That means that using Active state as a way of estimating 
jobs actually running is going to over-estimate by some amount. In the 
plots above it looks like theres about 5..10s more delay n completion 
notifications compared to start notifications.

The long delay in completion notifications will have an effect in slowing 
down job throughput through gram4 - stageout of output data and subsequent 
allocation of the site for another job will both be delayed.

I've heard that in gt4.2, this notification delivery is a lot better, 
though in practice at gridka I saw severe notification delays when a room 
full of students hit a container so the future there is not all roses.

I think for coasters and falkon, job completions will be indicated in a 
much more timely fashion - however I've not actually plotted the above 
graphs for runs of either. I think for falkon, Zhao has been keeping Swift 
-info logs for the purposes of debugging worker node performance, so there 
is enough information around to get these plots for Falkon already (by 
running the latest version of swift-plot-log). I'd be interested to see 
that, as a sanity check.

-- 




More information about the Swift-devel mailing list