Mihael,<div><br></div><div><div class="gmail_quote">On Sun, Oct 2, 2011 at 9:40 PM, Mihael Hategan <span dir="ltr"><<a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Sun, 2011-10-02 at 21:27 -0500, Ketan Maheshwari wrote:<br>
> Mihael,<br>
><br>
><br>
</div><div class="im">> So far, I've been using the proxy mode:<br>
><br>
><br>
> <profile namespace="swift" key="stagingMethod">proxy</profile><br>
><br>
><br>
> I just tried using the non-proxy (file/local) mode:<br>
><br>
><br>
> <filesystem provider="local" url="none" /><br>
<br>
</div><profile namespace="swift" key="stagingMethod">file</profile><br></blockquote><div><br></div><div>Thanks, however, on using the above file mode, Swift do not seem to be progressing. On stdout, I see intermittent "Active: 1" lines but they dissappear and get back to submitted status:</div>
<div><br></div><div>This happens for about 20 minutes after which the run starts but with high number of failures, with following message:</div><div><br></div><div><div>Caused by: Task failed: null</div><div>org.globus.cog.karajan.workflow.service.channels.ChannelException: Channel died and no contact available</div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>at org.globus.cog.karajan.workflow.service.channels.ChannelManager.connect(ChannelManager.java:235)</div><div><span class="Apple-tab-span" style="white-space:pre">  </span>at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:257)</div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>at org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227)</div><div><span class="Apple-tab-span" style="white-space:pre">   </span>at org.globus.cog.abstraction.coaster.service.job.manager.Node.getChannel(Node.java:125)</div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>at org.globus.cog.abstraction.coaster.service.job.manager.Cpu.submit(Cpu.java:245)</div><div><span class="Apple-tab-span" style="white-space:pre">   </span>at org.globus.cog.abstraction.coaster.service.job.manager.Cpu.launchSequential(Cpu.java:203)</div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>at org.globus.cog.abstraction.coaster.service.job.manager.Cpu.launch(Cpu.java:189)</div><div><span class="Apple-tab-span" style="white-space:pre">   </span>at org.globus.cog.abstraction.coaster.service.job.manager.Cpu.pull(Cpu.java:159)</div>
<div><span class="Apple-tab-span" style="white-space:pre">      </span>at org.globus.cog.abstraction.coaster.service.job.manager.PullThread.run(PullThread.java:98)</div></div><div><br></div><div>On the workers stdout, I see 59 workers are running:</div>
<div>"*** demandThread: swiftDemand=20 paddedDemand=24 totalRunning=59"</div><div><br></div><div>In the worker logs, I do not see any errors except for one worker which says:</div><div><br></div><div>"Failed to register (timeout)" </div>
<div><br></div><div>The log for this run is: <a href="http://www.ci.uchicago.edu/~ketan/catsn-20111003-0901-nd7ta1bb.log">http://www.ci.uchicago.edu/~ketan/catsn-20111003-0901-nd7ta1bb.log</a></div><div><br></div><div>The data size for this run is 10MB per task.</div>
<div><br></div><div>Regards,</div><div>Ketan</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
And that is not related to the heartbeat error, which I'm not sure why<br>
you're getting.<br>
<br>
As for the errors you get in proxy mode, are you sure your workers are<br>
fine?<br>
<br>
<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Ketan<br><br><br>
</div>