[Swift-devel] log plots of start/end times
Ben Clifford
benc at hawaga.org.uk
Wed Oct 8 08:00:26 CDT 2008
On Mon, 6 Oct 2008, Ben Clifford wrote:
> In interactions with a couple of people in the past month I've had concern
> about lack of correlation (specifically how much can you rely on the first
> above to imply the second?) so I've added a couple of plots to the
> standard swift-lot-log plots that the log-processing module makes. These
> appear at the bottom on the info page on plots.
I did a run of 3000 touch jobs against the UC teragrid site using gram4.
The plots mentioned above are the last two on this page:
http://www.ci.uchicago.edu/~benc/report-066-many-20081008-0620-tdnpx947/info.html
The difference in time on the client side for active and completion status
changes differs from the worker node by some amount - a minute or so once
a large number of jobs are going through.
For the purposes of estimating jobs in progress, a simple delay on
notification delivery shouldn't matter too much. What is more interesting
in that respect is that there is a larger delay for active completions
than for starts. That means that using Active state as a way of estimating
jobs actually running is going to over-estimate by some amount. In the
plots above it looks like theres about 5..10s more delay n completion
notifications compared to start notifications.
The long delay in completion notifications will have an effect in slowing
down job throughput through gram4 - stageout of output data and subsequent
allocation of the site for another job will both be delayed.
I've heard that in gt4.2, this notification delivery is a lot better,
though in practice at gridka I saw severe notification delays when a room
full of students hit a container so the future there is not all roses.
I think for coasters and falkon, job completions will be indicated in a
much more timely fashion - however I've not actually plotted the above
graphs for runs of either. I think for falkon, Zhao has been keeping Swift
-info logs for the purposes of debugging worker node performance, so there
is enough information around to get these plots for Falkon already (by
running the latest version of swift-plot-log). I'd be interested to see
that, as a sanity check.
--
More information about the Swift-devel
mailing list