<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Thanks Yadu,<div class=""><br class=""></div><div class="">I have a few questions.</div><div class="">- How do I invoke swift and pass it the new swift.conf?</div><div class="">- What is the “restart” procedure?</div><div class="">- Is there a module I can load to use the latest swift trunk?</div><div class=""><br class=""></div><div class="">Jonathan</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji <<a href="mailto:yadunand@uchicago.edu" class="">yadunand@uchicago.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
Hi Jonathan,<br class="">
<br class="">
I believe some of the issues related to timeouts seen in your logs
are fixed/less likely in trunk<br class="">
and would recommend that you try a run with that. I've also
converted your swift.properties to<br class="">
the new swift.conf format. You can get a tested .conf file along
with a small test case from here:<br class="">
<br class="">
<a class="moz-txt-link-freetext" href="http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz">http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz</a><br class="">
<br class="">
Here are some changes I've made to the conf:<br class="">
lazyErrors: true and executionRetries: 0 so that long running jobs
are not retried.<br class="">
staging set to direct, since you are running on the shared FS.<br class="">
added worker logging and an app definition for debug.<br class="">
<br class="">
You can get the latest trunk build from here :
<a class="moz-txt-link-freetext" href="http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz">http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz</a><br class="">
<br class="">
Thanks,<br class="">
Yadu<br class="">
<br class="">
<div class="moz-cite-prefix">On 12/03/2014 01:16 PM, Jonathan Ozik
wrote:<br class="">
</div>
<blockquote cite="mid:040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com" type="cite" class="">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">
<div class="" style="word-wrap:break-word">Hi Yadu,
<div class=""><br class="">
</div>
<div class="">The tar.gz archive is here: <a moz-do-not-send="true" href="https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0" class="">https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0</a></div>
<div class="">I’m also attaching the swift.properties file that
I used below.</div>
<div class=""><br class="">
</div>
<div class="">Thank you,</div>
<div class=""><br class="">
</div>
<div class="">Jonathan</div>
</div>
<div class="" style="word-wrap:break-word">
<div class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Dec 3, 2014, at 11:04 AM, Yadu Nand
Babuji <<a moz-do-not-send="true" href="mailto:yadunand@uchicago.edu" class="">yadunand@uchicago.edu</a>>
wrote:</div>
<br class="x_Apple-interchange-newline">
<div class="">Hi Jonathan,<br class="">
<br class="">
The issue you are seeing sounds pretty close to what
David reported a <br class="">
while back.<br class="">
Could you send us a tar ball of your run directory from
a failed run ?<br class="">
<br class="">
Could you also check if you've set lowOverAllocation and
<br class="">
highOverAllocation in your sites definition ?<br class="">
<br class="">
Thanks,<br class="">
Yadu<br class="">
<br class="">
On 12/03/2014 10:50 AM, Ozik, Jonathan wrote:<br class="">
<blockquote type="cite" class="">Hi all,<br class="">
<br class="">
I’m trying to run a large set of simulations on Midway
using Swift 0.95-RC5.<br class="">
768 of the 2187 tasks completed successfully and then
I got the exception:<br class="">
<br class="">
<span class="x_Apple-tab-span" style="white-space:pre"></span>exception
@ swift-int.k, line: 530<br class="">
Caused by: Block task failed: Connection to worker
lost<br class="">
org.globus.cog.coaster.TimeoutException: Channel timed
out. lastTime=141203-145449.325,
now=141203-145649.844, channel=TCPChannel [type:
server, contact: 1202-5410010-000072-000000]<br class="">
<span class="x_Apple-tab-span" style="white-space:pre"></span>at
org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)<br class="">
<span class="x_Apple-tab-span" style="white-space:pre"></span>at
org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)<br class="">
<span class="x_Apple-tab-span" style="white-space:pre"></span>at
java.util.TimerThread.mainLoop(Timer.java:555)<br class="">
<span class="x_Apple-tab-span" style="white-space:pre"></span>at
java.util.TimerThread.run(Timer.java:505)<br class="">
<br class="">
Progress: Wed, 03 Dec 2014 14:59:51+0000
Submitted:651 Failed:6 Finished successfully:768
Failed but can retry:762<br class="">
Progress: Wed, 03 Dec 2014 14:59:52+0000
Submitted:651 Failed:44 Finished successfully:768
Failed but can retry:724<br class="">
<br class="">
And the process seems to have stopped.<br class="">
<br class="">
What log file would be helpful for diagnosing this?<br class="">
<br class="">
Jonathan<br class="">
<br class="">
<br class="">
_______________________________________________<br class="">
Swift-user mailing list<br class="">
<a moz-do-not-send="true" href="mailto:Swift-user@ci.uchicago.edu" class="">Swift-user@ci.uchicago.edu</a><br class="">
<a class="moz-txt-link-freetext" href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a><br class="">
</blockquote>
<br class="">
_______________________________________________<br class="">
Swift-user mailing list<br class="">
<a moz-do-not-send="true" href="mailto:Swift-user@ci.uchicago.edu" class="">Swift-user@ci.uchicago.edu</a><br class="">
<a class="moz-txt-link-freetext" href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a></div>
</blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
<br class="">
</div>
</div></blockquote></div><br class=""></div></body></html>