<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Although, when switching to TCP, most of my problems magically went
away... obviously TCP's error recovery mechanisms are more robust than
what I implemented. The moral of the story is from my experience, have
a UDP option for potentially better performance and scalability, but
have TCP as a configurable option for potentially better reliability
and robustness.<br>
<br>
Ioan<br>
<br>
Mihael Hategan wrote:
<blockquote cite="mid:1207473442.10063.3.camel@blabla.mcs.anl.gov"
type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="">Of course it's unreliable unless you deal with the reliability issues as
outlined above.
</pre>
</blockquote>
<pre wrap="">I did deal with them, duplicates, out of order, retries, timeouts,
etc... yet, I still couldn't get a 100% reliable implementation,
</pre>
</blockquote>
<pre wrap=""><!---->
Of course you couldn't. It's impossible.
</pre>
<blockquote type="cite">
<pre wrap=""> and I
gave up... in theory, UDP should work given that you deal with all the
reliability issues you outlined. I am just pointing out that after lots
of debugging, I gave in and swapped UDP for TCP to avoid the unexplained
lost message once in a while. I am positive it was a bug in my code, so
perhaps you'll have better luck!
</pre>
<blockquote type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap="">Is the 180 tasks/sec the overall throughput measured from Swift's
point of view, including overhead of wrapper.sh? Or is that a
micro-benchmark measuring just the coaster performance?
</pre>
</blockquote>
<pre wrap="">It's at the provider level. No wrapper.sh.
</pre>
</blockquote>
<pre wrap="">OK, great!
Ioan
</pre>
<blockquote type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap="">Ioan
Mihael Hategan wrote:
</pre>
<blockquote type="cite">
<pre wrap="">On Fri, 2008-04-04 at 06:59 -0500, Michael Wilde wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Mihael, this is great progress - very exciting.
Some questions (dont need answers right away):
How would the end user use it? Manually start a service?
Is the service a separate process, or in the swift jvm?
</pre>
</blockquote>
<pre wrap="">I though the lines below answered some of these.
A user would specify the coaster provider in sites.xml. The provider
will then automatically deploy a service on the target machine without
the user having to do so. Given that the service is on a different
machine than the client, they can't be in the same JVM.
</pre>
<blockquote type="cite">
<pre wrap="">How are the number of workers set or adjusted?
</pre>
</blockquote>
<pre wrap="">Currently workers are requested as much as needed, up to a maximum. This
is preliminary hence "Better allocation strategy for workers".
</pre>
<blockquote type="cite">
<pre wrap="">Does a service manage workers on one cluster or many?
</pre>
</blockquote>
<pre wrap="">One service per cluster.
</pre>
<blockquote type="cite">
<pre wrap="">At 180 jobs/sec with 10 workers, what were the CPU loads on swift,
worker and service?
</pre>
</blockquote>
<pre wrap="">I faintly recall them being at less than 50% for some reason I don't
understand.
</pre>
<blockquote type="cite">
<pre wrap="">Do you want to try this on the workflows we're running on Falkon on the
BGP and SiCortex?
</pre>
</blockquote>
<pre wrap="">Let me repeat "prototype" and "more testing". In no way do I want to do
preliminary testing with an application that is shaky on an architecture
that is also shaky.
Mihael
</pre>
<blockquote type="cite">
<pre wrap="">Im eager to try it when you feel its ready for others to test.
Nice work!
- Mike
On 4/4/08 4:39 AM, Mihael Hategan wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I've been asked for a summary of the status of the coaster prototype, so
here it is:
- It's a prototype so bugs are plenty
- It's self deployed (you don't need to start a service on the target
cluster)
- You can also use it while starting a service on the target cluster
- There is a worker written in Perl
- It uses encryption between client and coaster service
- It uses UDP between the service and the workers (this may prove to be
better or worse choice than TCP)
- A preliminary test done locally shows an amortized throughput of
around 180 jobs/s (/bin/date). This was done with encryption and with 10
workers. Pretty picture attached (total time vs. # of jobs)
To do:
- The scheduling algorithm in the service needs a bit more work
- When worker messages are lost, some jobs may get lost (i.e. needs more
fault tolerance)
- Start testing it on actual clusters
- Do some memory consumption benchmarks
- Better allocation strategy for workers
Mihael
------------------------------------------------------------------------
_______________________________________________
Swift-devel mailing list
<a class="moz-txt-link-abbreviated"
href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a>
<a class="moz-txt-link-freetext"
href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a>
</pre>
</blockquote>
</blockquote>
<pre wrap="">_______________________________________________
Swift-devel mailing list
<a class="moz-txt-link-abbreviated"
href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a>
<a class="moz-txt-link-freetext"
href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a>
</pre>
</blockquote>
<pre wrap="">--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: <a class="moz-txt-link-abbreviated"
href="mailto:iraicu@cs.uchicago.edu">iraicu@cs.uchicago.edu</a>
Web: <a class="moz-txt-link-freetext"
href="http://www.cs.uchicago.edu/%7Eiraicu">http://www.cs.uchicago.edu/~iraicu</a>
<a class="moz-txt-link-freetext"
href="http://dev.globus.org/wiki/Incubator/Falkon">http://dev.globus.org/wiki/Incubator/Falkon</a>
<a class="moz-txt-link-freetext"
href="http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page">http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page</a>
===================================================
===================================================
</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
</blockquote>
<pre wrap=""><!---->
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: <a class="moz-txt-link-abbreviated"
href="mailto:iraicu@cs.uchicago.edu">iraicu@cs.uchicago.edu</a>
Web: <a class="moz-txt-link-freetext"
href="http://www.cs.uchicago.edu/%7Eiraicu">http://www.cs.uchicago.edu/~iraicu</a>
<a class="moz-txt-link-freetext"
href="http://dev.globus.org/wiki/Incubator/Falkon">http://dev.globus.org/wiki/Incubator/Falkon</a>
<a class="moz-txt-link-freetext"
href="http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page">http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page</a>
===================================================
===================================================
</pre>
</body>
</html>