[Swift-devel] persistent coasters and data staging
Mihael Hategan
hategan at mcs.anl.gov
Mon Oct 3 14:23:17 CDT 2011
Are you running with a standalone coaster service? If yes, can you also
post the service log?
On Mon, 2011-10-03 at 09:25 -0500, Ketan Maheshwari wrote:
> Mihael,
>
>
> On Sun, Oct 2, 2011 at 9:40 PM, Mihael Hategan <hategan at mcs.anl.gov>
> wrote:
> On Sun, 2011-10-02 at 21:27 -0500, Ketan Maheshwari wrote:
> > Mihael,
> >
> >
>
> > So far, I've been using the proxy mode:
> >
> >
> > <profile namespace="swift"
> key="stagingMethod">proxy</profile>
> >
> >
> > I just tried using the non-proxy (file/local) mode:
> >
> >
> > <filesystem provider="local" url="none" />
>
>
> <profile namespace="swift" key="stagingMethod">file</profile>
>
>
> Thanks, however, on using the above file mode, Swift do not seem to be
> progressing. On stdout, I see intermittent "Active: 1" lines but they
> dissappear and get back to submitted status:
>
>
> This happens for about 20 minutes after which the run starts but with
> high number of failures, with following message:
>
>
> Caused by: Task failed: null
> org.globus.cog.karajan.workflow.service.channels.ChannelException:
> Channel died and no contact available
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.connect(ChannelManager.java:235)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:257)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.Node.getChannel(Node.java:125)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.Cpu.submit(Cpu.java:245)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.Cpu.launchSequential(Cpu.java:203)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.Cpu.launch(Cpu.java:189)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.Cpu.pull(Cpu.java:159)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.PullThread.run(PullThread.java:98)
>
>
> On the workers stdout, I see 59 workers are running:
> "*** demandThread: swiftDemand=20 paddedDemand=24 totalRunning=59"
>
>
> In the worker logs, I do not see any errors except for one worker
> which says:
>
>
> "Failed to register (timeout)"
>
>
> The log for this run is:
> http://www.ci.uchicago.edu/~ketan/catsn-20111003-0901-nd7ta1bb.log
>
>
> The data size for this run is 10MB per task.
>
>
> Regards,
> Ketan
>
>
>
>
>
> And that is not related to the heartbeat error, which I'm not
> sure why
> you're getting.
>
> As for the errors you get in proxy mode, are you sure your
> workers are
> fine?
>
>
>
>
>
>
> --
> Ketan
>
>
>
More information about the Swift-devel
mailing list