<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Here is my 2c of experience in trying to draw up graphs of various

experiments.  I make a clear distinction between 1) logs that will be

used for debugging/info that are in a relatively human readable format,

and those logs that will be used for plotting graphs!  The human

readable logs (1) are almost always occurring based on events in the

system.  On the other hand, the logs that are geared towards graphing

them (2) are mostly based on fixed time intervals, and a few are based

on events.  <br>

<br>

For example, in Falkon, I have the following set of logs:<br>

1) Falkon dispatcher log (1 for the entire Falkon system) with

debug/info level human readable logs, and it typically writes to this

log for events related to the task dispatch and notifications that

happen in the Falkon service; this log is currently only used for

debugging purposes.<br>

<br>

2) Falkon provisioner log (1 for the entire Falkon system) with

debug/info level human readable logs, and it typically writes to

this log for events related to the allocation of resources; this log is

currently only used for debugging purposes.<br>

<br>

3) Executor logs (1 per executor, separated into different files); this

is also for human consumption that at the most detailed logging level,

it prints out even the STDOUT and STERR of the task executions!  These

logs are not aggregated in any way currently, and are mostly used for

debugging purposes.<br>

<br>

4) Task description log (1 for the entire Falkon system), which stores

the description of each task executed (i.e. TIMESTAMP, APPLICATION_ID,

EXECUTABLE, ARGUEMENTS, ENVIRONMENT); I have not used this log yet for

anything, but I envision we could use it for workload characterization,

studies involving replaying an entire workload, etc... <br>

<br>

5) Summary log (1 for the entire Falkon system) with an easy to parse

format for automatic graph generation; this log is generated on fixed

time intervals, in which some of the Falkon state is summarized for the

duration of that period; the kind of state information that goes in

this log is: TimeStamp_ms num_users num_resources num_threads

num_all_workers num_free_workers num_pend_workers num_busy_workers

waitQ_length waitNotQ_length activeQ_length doneQ_length

delivered_tasks throughput_tasks/sec; this log can be used to plot the

number of executors registered, active, idle, the queue length, the

throughput of task delivered, etc... as the experiment progresses.  In

my latest development branch, I actually have a few more parameters

that I am logging, such as CPU utilization, free memory, data caching

hit rates, etc...<br>

<br>

6) Per task log (1 for the entire Falkon system) that has information

on each task executed in Falkon; this log is used to plot the per task

info as the experiment progresses.  The information that is kep on each

task is: taskID workerID startTime endTime waitQueueTime execTime

resultsQueueTime totalTime exitCode; this log can also be used to plot

the per worker information, to see how the tasks were dispersed over

the workers...<br>

<br>

7) User information log (1 for the entire Falkon system) that stores

information relevant for the end user, and is updated every time the

state (wait, active, done) changes for any task; the information that

this log contains is: Time_ms Users Resources JVM_Threads WaitingTasks

ActiveTasks DoneTasks DeliveredTasks; I have not used this log for

anything yet, but it has much more fine granular information that the

summary log (5), so more detailed graphs/analysis could be generated

for this log.<br>

<br>

8) Worker information logs (1 for the entire Falkon system) that stores

information about the workers state changes and is updated every time

the state (free, pending, busy)

changes for any worker; the information that this log contains is:

Time_ms RegisteredWorkers FreeWorkers PendWorkers BusyWorkers; again, I

have not used this log for anything yet, but it has much more fine

granular information that the summary log (5), so more detailed

graphs/analysis could be generated for this log.<br>

<br>

<br>

Now, as a summary, I use (5) and (6) a lot to generate the graphs that

I do for Falkon.  I have not used (7) and (8) yet, but might in the

future.  Its also relatively easy to add new state information to log

to these existing logs since they are all localized in a few places,

with little effort, I can add new metrics to monitor, or create a

completely new log that has other information that was not easy to

integrate into existing logs.  For simplicity, my perf logs (5-8) are

all simple logs that are just space delimited...<br>

<br>

<blockquote type="cite">taskID workerID startTime endTime waitQueueTime

execTime resultsQueueTime totalTime exitCode<br>

tg-viz-login1.uc.teragrid.org:50103:1_1326356873

tg-c058.uc.teragrid.org:50100 1182533457601 1182533985431 467599 60225

6 527830 0<br>

tg-viz-login1.uc.teragrid.org:50103:2_1124048393

tg-c052.uc.teragrid.org:50100 1182533457613 1182533985454 467735 60101

5 527841 0<br>

tg-viz-login1.uc.teragrid.org:50103:3_1648367237

tg-c053.uc.teragrid.org:50100 1182533457616 1182533985524 467760 60138

10 527908 0</blockquote>

They could be converted to XML or any other format you want, but this

is a nice format for programs like ploticus or gnuplot to understand

easily.  <br>

<br>

On the other hand, my debug logs (1-4) are all handled via log4j, look

more like the traditional logs that log4j generates and people are

accustomed to, but from my point of view, these are tedious and

error-prone to parse for graphing purposes.<br>

<br>

Does this distinction (human readable vs. machine readable) between

logs exist in Swift?  If not, I would argue to not modify the

debug/info logs, but to create new logs that are specifically targeted

at automatic graph generations, such as my logs (5-8).  If we are to

use tools that others have built, then we just need to make sure these

new logs conform to the appropriate format; if we are to write our own

tools (or we already have them), then we have as much freedom as we

want on what format these logs should be.<br>

<br>

Ioan<br>

<br>

<br>

Mihael Hategan wrote:

<blockquote cite="mid:1182788118.23226.3.camel@blabla.mcs.anl.gov"

 type="cite">

  <pre wrap="">On Mon, 2007-06-25 at 11:03 -0500, Ian Foster wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">So who is going to do this?

I've been asking about this for some time, and nothing has happened. The 

result, I think, has been a lot of confusion and delay.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Are we still talking about collecting logs? I'm a bit confused.

  </pre>

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">I agree fully with Mihael's point that we can and should start 

gathering all execution logs into a uniformly structured gathering 

place. Then we can organize the current log tools and determine whats 

needed next in that area.

      </pre>

    </blockquote>

  </blockquote>

  <pre wrap=""><!---->

_______________________________________________

Swift-devel mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a>

<a class="moz-txt-link-freetext" href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a>

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

============================================

Ioan Raicu

Ph.D. Student

============================================

Distributed Systems Laboratory

Computer Science Department

University of Chicago

1100 E. 58th Street, Ryerson Hall

Chicago, IL 60637

============================================

Email: <a class="moz-txt-link-abbreviated" href="mailto:iraicu@cs.uchicago.edu">iraicu@cs.uchicago.edu</a>

Web:   <a class="moz-txt-link-freetext" href="http://www.cs.uchicago.edu/~iraicu">http://www.cs.uchicago.edu/~iraicu</a>

       <a class="moz-txt-link-freetext" href="http://dsl.cs.uchicago.edu/">http://dsl.cs.uchicago.edu/</a>

============================================

============================================

</pre>

</body>

</html>