<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
If it doesn't apply to meta-data operations, such as directories, then
it means that meta-data changes in the file system is rather
centralized (maybe this explains the relatively poor performance for
creating and removing directories). I would be curious to see how well
the solution works to move data to the local disk first prior to
processing, to avoid working from the shared file system (including the
creation and removal of the scratch temp directory on GPFS).<br>
<br>
Ioan <br>
<br>
Mihael Hategan wrote:
<blockquote cite="mid:1193441985.9302.1.camel@blabla.mcs.anl.gov"
type="cite">
<pre wrap="">On Fri, 2007-10-26 at 15:11 -0500, Ioan Raicu wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I am not sure what configuration exists on TP, but on the TeraGrid
ANL/UC cluster, with 8 servers behind GPFS, the wrapper script
performance (create dir, create symbolic links, remove directory... all
on GPFS) is anywhere between 20~40 / sec, depending on how many nodes
you have doing this concurrently. The throughput increases first as you
add nodes, but then decreases down to about 20/sec with 20~30+ nodes.
What this means is that even if you bundle jobs up, you will not get
anything better than this, throughput wise, regardless of how short the
jobs are. Now, if TP has less than 8 servers, its likely that the
throughput it can sustain is even lower,
</pre>
</blockquote>
<pre wrap=""><!---->
Perhaps in terms of bytes/s. But I wouldn't be so sure that this applies
to other file stuff.
</pre>
<blockquote type="cite">
<pre wrap=""> and if you push it over the
edge, even to the point of thrashing where the throughput can be
extremely small. I don't have any suggestions of how you can get
around this, with the exception of making your job sizes larger on
average, and hence have fewer jobs over the same period of time.
Ioan
Andrew Robert Jamieson wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I am kind of at a stand still for getting anything done on TP right
now with this problem. Are there any suggestions to overcome this for
the time being?
On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hello all,
I am encountering the following problem on Teraport. I submit a
clustered swift WF which should amount to something on the order of
850x3 individual jobs total. I have clustered the jobs because they
are very fast (somewhere around 20 sec to 1 min long). When I submit
the WF on TP things start out fantastic, I get 10s of output files in
a matter of seconds and nodes would start and finish clustered
batches in a matter of minutes or less. However, after waiting about
3-5 mins, when clustered jobs are begin to line up in the queue and
more start running at the same time, things start to slow down to a
trickle in terms of output.
One thing I noticed is when I try a simply ls on TP in the swift temp
running directory where the temp job dirs are created and destroyed,
it take a very long time. And when it is done only five or so things
are in the dir. (this is the dir with "info kickstart shared
status wrapper.log" in it). What I think is happening is that TP's
filesystem cant handle this extremely rapid creation/destruction of
directories in that shared location. From what I have been told these
temp dirs come and go as long as the job runs successfully.
What I am wondering is if there is anyway to move that dir to the
local node tmp diretory not the shared file system, while it is
running and if something fails then have it sent to the appropriate
place.
Or, if another layer of temp dir wrapping could be applied with
labeld perhaps with respect to the clustered job grouping and not
simply the individual jobs (since there are thousands being computed
at once).
That these things would only be generated/deleted every 5 mins or 10
mins (if clustered properly on my part) instead of one event every
milli second or what have you.
I don't know which solution is feasible or if any are at all, but
this seems to be a major problem for my WFs. In general it is never
good to have a million things coming and going on a shared file
system in one place, from my experience at least.
Thanks,
Andrew
_______________________________________________
Swift-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a>
<a class="moz-txt-link-freetext" href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a>
</pre>
</blockquote>
<pre wrap="">_______________________________________________
Swift-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a>
<a class="moz-txt-link-freetext" href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a>
</pre>
</blockquote>
<pre wrap="">--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: <a class="moz-txt-link-abbreviated" href="mailto:iraicu@cs.uchicago.edu">iraicu@cs.uchicago.edu</a>
Web: <a class="moz-txt-link-freetext" href="http://www.cs.uchicago.edu/~iraicu">http://www.cs.uchicago.edu/~iraicu</a>
<a class="moz-txt-link-freetext" href="http://dsl.cs.uchicago.edu/">http://dsl.cs.uchicago.edu/</a>
============================================
============================================
_______________________________________________
Swift-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a>
<a class="moz-txt-link-freetext" href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a>
</pre>
</blockquote>
<pre wrap=""><!---->
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: <a class="moz-txt-link-abbreviated" href="mailto:iraicu@cs.uchicago.edu">iraicu@cs.uchicago.edu</a>
Web: <a class="moz-txt-link-freetext" href="http://www.cs.uchicago.edu/~iraicu">http://www.cs.uchicago.edu/~iraicu</a>
<a class="moz-txt-link-freetext" href="http://dsl.cs.uchicago.edu/">http://dsl.cs.uchicago.edu/</a>
============================================
============================================</pre>
</body>
</html>