<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Thanks Yadu,<div class=""><br class=""></div><div class="">I have a few questions.</div><div class="">- How do I invoke swift and pass it the new swift.conf?</div><div class="">- What is the “restart” procedure?</div><div class="">- Is there a module I can load to use the latest swift trunk?</div><div class=""><br class=""></div><div class="">Jonathan</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 3, 2014, at 7:03 PM, Yadu Nand Babuji <<a href="mailto:yadunand@uchicago.edu" class="">yadunand@uchicago.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
  
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type" class="">
  
  <div bgcolor="#FFFFFF" text="#000000" class="">
    Hi Jonathan,<br class="">
    <br class="">
    I believe some of the issues related to timeouts seen in your logs
    are fixed/less likely in trunk<br class="">
    and would recommend that you try a run with that. I've also
    converted your swift.properties to<br class="">
    the new swift.conf format. You can get a tested .conf file along
    with a small test case from here:<br class="">
    <br class="">
    <a class="moz-txt-link-freetext" href="http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz">http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz</a><br class="">
    <br class="">
    Here are some changes I've made to the conf:<br class="">
    lazyErrors: true and executionRetries: 0 so that long running jobs
    are not retried.<br class="">
    staging set to direct, since you are running on the shared FS.<br class="">
    added worker logging and an app definition for debug.<br class="">
    <br class="">
    You can get the latest trunk build from here :
    <a class="moz-txt-link-freetext" href="http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz">http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz</a><br class="">
    <br class="">
    Thanks,<br class="">
    Yadu<br class="">
    <br class="">
    <div class="moz-cite-prefix">On 12/03/2014 01:16 PM, Jonathan Ozik
      wrote:<br class="">
    </div>
    <blockquote cite="mid:040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com" type="cite" class="">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">
      <div class="" style="word-wrap:break-word">Hi Yadu,
        <div class=""><br class="">
        </div>
        <div class="">The tar.gz archive is here: <a moz-do-not-send="true" href="https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0" class="">https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0</a></div>
        <div class="">I’m also attaching the swift.properties file that
          I used below.</div>
        <div class=""><br class="">
        </div>
        <div class="">Thank you,</div>
        <div class=""><br class="">
        </div>
        <div class="">Jonathan</div>
      </div>
      <div class="" style="word-wrap:break-word">
        <div class=""><br class="">
          <div class="">
            <blockquote type="cite" class="">
              <div class="">On Dec 3, 2014, at 11:04 AM, Yadu Nand
                Babuji <<a moz-do-not-send="true" href="mailto:yadunand@uchicago.edu" class="">yadunand@uchicago.edu</a>>
                wrote:</div>
              <br class="x_Apple-interchange-newline">
              <div class="">Hi Jonathan,<br class="">
                <br class="">
                The issue you are seeing sounds pretty close to what
                David reported a <br class="">
                while back.<br class="">
                Could you send us a tar ball of your run directory from
                a failed run ?<br class="">
                <br class="">
                Could you also check if you've set lowOverAllocation and
                <br class="">
                highOverAllocation in your sites definition ?<br class="">
                <br class="">
                Thanks,<br class="">
                Yadu<br class="">
                <br class="">
                On 12/03/2014 10:50 AM, Ozik, Jonathan wrote:<br class="">
                <blockquote type="cite" class="">Hi all,<br class="">
                  <br class="">
                  I’m trying to run a large set of simulations on Midway
                  using Swift 0.95-RC5.<br class="">
                  768 of the 2187 tasks completed successfully and then
                  I got the exception:<br class="">
                  <br class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>exception
                  @ swift-int.k, line: 530<br class="">
                  Caused by: Block task failed: Connection to worker
                  lost<br class="">
                  org.globus.cog.coaster.TimeoutException: Channel timed
                  out. lastTime=141203-145449.325,
                  now=141203-145649.844, channel=TCPChannel [type:
                  server, contact: 1202-5410010-000072-000000]<br class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>at
org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)<br class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>at
org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)<br class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>at
                  java.util.TimerThread.mainLoop(Timer.java:555)<br class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>at
                  java.util.TimerThread.run(Timer.java:505)<br class="">
                  <br class="">
                  Progress: Wed, 03 Dec 2014 14:59:51+0000
                   Submitted:651  Failed:6  Finished successfully:768
                   Failed but can retry:762<br class="">
                  Progress: Wed, 03 Dec 2014 14:59:52+0000
                   Submitted:651  Failed:44  Finished successfully:768
                   Failed but can retry:724<br class="">
                  <br class="">
                  And the process seems to have stopped.<br class="">
                  <br class="">
                  What log file would be helpful for diagnosing this?<br class="">
                  <br class="">
                  Jonathan<br class="">
                  <br class="">
                  <br class="">
                  _______________________________________________<br class="">
                  Swift-user mailing list<br class="">
                  <a moz-do-not-send="true" href="mailto:Swift-user@ci.uchicago.edu" class="">Swift-user@ci.uchicago.edu</a><br class="">
<a class="moz-txt-link-freetext" href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a><br class="">
                </blockquote>
                <br class="">
                _______________________________________________<br class="">
                Swift-user mailing list<br class="">
                <a moz-do-not-send="true" href="mailto:Swift-user@ci.uchicago.edu" class="">Swift-user@ci.uchicago.edu</a><br class="">
<a class="moz-txt-link-freetext" href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a></div>
            </blockquote>
          </div>
          <br class="">
        </div>
      </div>
    </blockquote>
    <br class="">
  </div>

</div></blockquote></div><br class=""></div></body></html>