[Swift-devel] Swift and BGP plots

Thu Oct 29 10:17:24 CDT 2009

On Thu, 2009-10-29 at 09:30 -0500, Ioan Raicu wrote:

> > Nope. 64K.
> >   
> OK, it would be good to look at why we have double the # of tasks. It
> must be my filtering of the Swift log. Here was my filtered log:
> http://www.ece.northwestern.edu/~iraicu/scratch/logs/dc-4000-active-completed.txt

Both the coaster service log and swift log go to the same place in that
case. You'll see a difference in the way the task IDs look. That's
something you can use.
This is a task on the swift side:
identity=urn:0-1-11475-1-1-1256524749943
This is on the coaster side:
identity=urn:1256524750479-1256524791529-1256524791530
[...]
> > 
> > It depends whether you count from the time the partition boots or from
> > the time swift starts. We could count the queue/partition boot time, but
> > that doesn't tell us much about swift. On the other hand, if we don't
> > there's still some submission happening during that time, so that
> > counts.
> >   
> I count from where the log starts. There is about 20 seconds of
> inactivity at the beginning of the log, but at around 20 sec in one
> log, and 24 sec in the other log, 1 job is submitted and running.

That's the worker block. It's "running" in that cobalt says so, but it's
booting.

>  At about 120 second into the run, the floodgate is opened and many
> jobs are submitted and start running. So, should we count from time 0,
> 20, or 120?

100. The 20 seconds of activity in the beginning are real
swift/submission overhead. The waiting until the partition boots isn't.

>  I guess its all about what you are trying to measure and show. In all
> cases, I think the workers were provisioned, its just a matter of how
> much of the Swift overhead you want to take into account I think.
> > The numbers for Falkon, were the workers started already?
> > 
> >   
> Yes, the workers were already provisioned in that case. 
> > > Not quite the 90%+ efficiencies when looking at a per task level, but
> > > still quite good! 
> > >     
> > 
> > I'm not quite sure what's happening. Maybe I wasn't clear. Though I was.
> > Is there some misunderstanding here about the different things being
> > measured and how?
> >   
> No. The real way to compute efficiency is to use the end-to-end time
> of the real run compared to the ideal run. The other efficiency I
> sometimes throw out is the per task efficiency, where you take the
> average real run time of all tasks, and compare it to the ideal time
> of a task. This second measure of efficiency is usually optimistic,
> but it allows us to measure efficiency between various different runs
> that might be too difficult to compare using the traditional
> efficiency metric.

Again, I believe the latter to be arbitrary. That's because according to
it you can have very low efficiencies yet linear speedups. In addition,
I see no literature to use it.

If you want to use such an arbitrary measure, fine. Please don't use it
on this.