<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi Jonathan,<br>
    <br>
    I believe some of the issues related to timeouts seen in your logs
    are fixed/less likely in trunk<br>
    and would recommend that you try a run with that. I've also
    converted your swift.properties to<br>
    the new swift.conf format. You can get a tested .conf file along
    with a small test case from here:<br>
    <br>
    <a class="moz-txt-link-freetext" href="http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz">http://users.rcc.uchicago.edu/~yadunand/test_configs_package.tar.gz</a><br>
    <br>
    Here are some changes I've made to the conf:<br>
    lazyErrors: true and executionRetries: 0 so that long running jobs
    are not retried.<br>
    staging set to direct, since you are running on the shared FS.<br>
    added worker logging and an app definition for debug.<br>
    <br>
    You can get the latest trunk build from here :
    <a class="moz-txt-link-freetext" href="http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz">http://users.rcc.uchicago.edu/~yadunand/swift-trunk-latest.tar.gz</a><br>
    <br>
    Thanks,<br>
    Yadu<br>
    <br>
    <div class="moz-cite-prefix">On 12/03/2014 01:16 PM, Jonathan Ozik
      wrote:<br>
    </div>
    <blockquote
      cite="mid:040074E2-ADC1-45C0-8580-9926B8E64535@gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="" style="word-wrap:break-word">Hi Yadu,
        <div class=""><br class="">
        </div>
        <div class="">The tar.gz archive is here: <a
            moz-do-not-send="true"
            href="https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0"
            class="">https://www.dropbox.com/s/tt3ewapzaf0ygac/run001.tar.gz?dl=0</a></div>
        <div class="">I’m also attaching the swift.properties file that
          I used below.</div>
        <div class=""><br class="">
        </div>
        <div class="">Thank you,</div>
        <div class=""><br class="">
        </div>
        <div class="">Jonathan</div>
      </div>
      <div class="" style="word-wrap:break-word">
        <div class=""><br class="">
          <div>
            <blockquote type="cite" class="">
              <div class="">On Dec 3, 2014, at 11:04 AM, Yadu Nand
                Babuji <<a moz-do-not-send="true"
                  href="mailto:yadunand@uchicago.edu" class="">yadunand@uchicago.edu</a>>
                wrote:</div>
              <br class="x_Apple-interchange-newline">
              <div class="">Hi Jonathan,<br class="">
                <br class="">
                The issue you are seeing sounds pretty close to what
                David reported a <br class="">
                while back.<br class="">
                Could you send us a tar ball of your run directory from
                a failed run ?<br class="">
                <br class="">
                Could you also check if you've set lowOverAllocation and
                <br class="">
                highOverAllocation in your sites definition ?<br
                  class="">
                <br class="">
                Thanks,<br class="">
                Yadu<br class="">
                <br class="">
                On 12/03/2014 10:50 AM, Ozik, Jonathan wrote:<br
                  class="">
                <blockquote type="cite" class="">Hi all,<br class="">
                  <br class="">
                  I’m trying to run a large set of simulations on Midway
                  using Swift 0.95-RC5.<br class="">
                  768 of the 2187 tasks completed successfully and then
                  I got the exception:<br class="">
                  <br class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>exception
                  @ swift-int.k, line: 530<br class="">
                  Caused by: Block task failed: Connection to worker
                  lost<br class="">
                  org.globus.cog.coaster.TimeoutException: Channel timed
                  out. lastTime=141203-145449.325,
                  now=141203-145649.844, channel=TCPChannel [type:
                  server, contact: 1202-5410010-000072-000000]<br
                    class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>at
org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)<br
                    class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>at
org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)<br
                    class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>at
                  java.util.TimerThread.mainLoop(Timer.java:555)<br
                    class="">
                  <span class="x_Apple-tab-span" style="white-space:pre"></span>at
                  java.util.TimerThread.run(Timer.java:505)<br class="">
                  <br class="">
                  Progress: Wed, 03 Dec 2014 14:59:51+0000
                   Submitted:651  Failed:6  Finished successfully:768
                   Failed but can retry:762<br class="">
                  Progress: Wed, 03 Dec 2014 14:59:52+0000
                   Submitted:651  Failed:44  Finished successfully:768
                   Failed but can retry:724<br class="">
                  <br class="">
                  And the process seems to have stopped.<br class="">
                  <br class="">
                  What log file would be helpful for diagnosing this?<br
                    class="">
                  <br class="">
                  Jonathan<br class="">
                  <br class="">
                  <br class="">
                  _______________________________________________<br
                    class="">
                  Swift-user mailing list<br class="">
                  <a moz-do-not-send="true"
                    href="mailto:Swift-user@ci.uchicago.edu" class="">Swift-user@ci.uchicago.edu</a><br
                    class="">
<a class="moz-txt-link-freetext" href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a><br
                    class="">
                </blockquote>
                <br class="">
                _______________________________________________<br
                  class="">
                Swift-user mailing list<br class="">
                <a moz-do-not-send="true"
                  href="mailto:Swift-user@ci.uchicago.edu" class="">Swift-user@ci.uchicago.edu</a><br
                  class="">
<a class="moz-txt-link-freetext" href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a></div>
            </blockquote>
          </div>
          <br class="">
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>