<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

hmmm, any recommendation on how to parse them out from each other? My

simple cat and grep probably won't work. Is there a patterns at least,

on the number of dashes "-"?<br>

<br>

Ioan<br>

<br>

Mihael Hategan wrote:

<blockquote cite="mid:1256829444.21206.9.camel@localhost" type="cite">

  <pre wrap="">On Thu, 2009-10-29 at 09:30 -0500, Ioan Raicu wrote:

  </pre>

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">Nope. 64K.

      </pre>

    </blockquote>

    <pre wrap="">OK, it would be good to look at why we have double the # of tasks. It

must be my filtering of the Swift log. Here was my filtered log:

<a class="moz-txt-link-freetext" href="http://www.ece.northwestern.edu/~iraicu/scratch/logs/dc-4000-active-completed.txt">http://www.ece.northwestern.edu/~iraicu/scratch/logs/dc-4000-active-completed.txt</a>

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Both the coaster service log and swift log go to the same place in that

case. You'll see a difference in the way the task IDs look. That's

something you can use.

This is a task on the swift side:

identity=urn:0-1-11475-1-1-1256524749943

This is on the coaster side:

identity=urn:1256524750479-1256524791529-1256524791530

[...]

  </pre>

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">It depends whether you count from the time the partition boots or from

the time swift starts. We could count the queue/partition boot time, but

that doesn't tell us much about swift. On the other hand, if we don't

there's still some submission happening during that time, so that

counts.

      </pre>

    </blockquote>

    <pre wrap="">I count from where the log starts. There is about 20 seconds of

inactivity at the beginning of the log, but at around 20 sec in one

log, and 24 sec in the other log, 1 job is submitted and running.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

That's the worker block. It's "running" in that cobalt says so, but it's

booting.

  </pre>

  <blockquote type="cite">

    <pre wrap=""> At about 120 second into the run, the floodgate is opened and many

jobs are submitted and start running. So, should we count from time 0,

20, or 120?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

100. The 20 seconds of activity in the beginning are real

swift/submission overhead. The waiting until the partition boots isn't.

  </pre>

  <blockquote type="cite">

    <pre wrap=""> I guess its all about what you are trying to measure and show. In all

cases, I think the workers were provisioned, its just a matter of how

much of the Swift overhead you want to take into account I think.

    </pre>

    <blockquote type="cite">

      <pre wrap="">The numbers for Falkon, were the workers started already?

      </pre>

    </blockquote>

    <pre wrap="">Yes, the workers were already provisioned in that case. 

    </pre>

    <blockquote type="cite">

      <blockquote type="cite">

        <pre wrap="">Not quite the 90%+ efficiencies when looking at a per task level, but

still quite good! 

        </pre>

      </blockquote>

      <pre wrap="">I'm not quite sure what's happening. Maybe I wasn't clear. Though I was.

Is there some misunderstanding here about the different things being

measured and how?

      </pre>

    </blockquote>

    <pre wrap="">No. The real way to compute efficiency is to use the end-to-end time

of the real run compared to the ideal run. The other efficiency I

sometimes throw out is the per task efficiency, where you take the

average real run time of all tasks, and compare it to the ideal time

of a task. This second measure of efficiency is usually optimistic,

but it allows us to measure efficiency between various different runs

that might be too difficult to compare using the traditional

efficiency metric.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Again, I believe the latter to be arbitrary. That's because according to

it you can have very low efficiencies yet linear speedups. In addition,

I see no literature to use it.

If you want to use such an arbitrary measure, fine. Please don't use it

on this.

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

=================================================================

Ioan Raicu, Ph.D.

NSF/CRA Computing Innovation Fellow

=================================================================

Center for Ultra-scale Computing and Information Security (CUCIS)

Department of Electrical Engineering and Computer Science

Northwestern University

2145 Sheridan Rd, Tech M384 

Evanston, IL 60208-3118

=================================================================

Cel:   1-847-722-0876

Tel:   1-847-491-8163

Email: <a class="moz-txt-link-abbreviated" href="mailto:iraicu@eecs.northwestern.edu">iraicu@eecs.northwestern.edu</a>

Web:   <a class="moz-txt-link-freetext" href="http://www.eecs.northwestern.edu/~iraicu/">http://www.eecs.northwestern.edu/~iraicu/</a>

       <a class="moz-txt-link-freetext" href="https://wiki.cucis.eecs.northwestern.edu/">https://wiki.cucis.eecs.northwestern.edu/</a>

=================================================================

=================================================================

</pre>

</body>

</html>