[Swift-user] Getting swift to run on Fusion

Jonathan Margoliash jmargolpeople at gmail.com
Wed Sep 12 11:32:56 CDT 2012


I attached the .0.rlog, .log, .d and the swift.log files. Which of those
files do you use for debugging? And these files are all located in the
directory

/home/jmargoliash/my_SwiftSCE2_branch_matlab/runs/run-20120912-103235

on Fusion, if that's what you were asking for. Thanks!

Jonathan

On Wed, Sep 12, 2012 at 12:20 PM, David Kelly <davidk at ci.uchicago.edu>wrote:

> Jonathan,
>
> Could you please provide a pointer to the log file that got created from
> this run?
>
> Thanks,
> David
>
> ----- Original Message -----
> > From: "Jonathan Margoliash" <jmargolpeople at gmail.com>
> > To: swift-user at ci.uchicago.edu, "Swift Language" <davidk at ci.uchicago.edu>,
> "Professor E. Yan" <eyan at anl.gov>
> > Sent: Wednesday, September 12, 2012 10:50:35 AM
> > Subject: Getting swift to run on Fusion
> > Hello swift support,
> >
> >
> > This is my first attempt getting swift to work on Fusion, and I'm
> > getting the following output to the terminal:
> >
> >
> > ------
> >
> >
> >
> > Warning: Function toint is deprecated, at line 10
> > Swift trunk swift-r5882 cog-r3434
> >
> >
> > RunID: 20120912-1032-5y7xb1ug
> > Progress: time: Wed, 12 Sep 2012 10:32:51 -0500
> > Progress: time: Wed, 12 Sep 2012 10:32:54 -0500 Selecting site:34
> > Submitted:8
> > Progress: time: Wed, 12 Sep 2012 10:32:57 -0500 Selecting site:34
> > Submitted:8
> > Progress: time: Wed, 12 Sep 2012 10:33:00 -0500 Selecting site:34
> > Submitted:8
> > ...
> > Progress: time: Wed, 12 Sep 2012 10:40:33 -0500 Selecting site:34
> > Submitted:8
> > Failed to shut down block: Block 0912-321051-000005 (8x60.000s)
> > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Failed to cancel task. qdel returned with an exit code of 153
> > at
> >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:205)
> > at
> >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85)
> > at
> >
> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69)
> > at
> >
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:102)
> > at
> >
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:91)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:320)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.Node.errorReceived(Node.java:100)
> > at
> >
> org.globus.cog.karajan.workflow.service.commands.Command.errorReceived(Command.java:203)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:237)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:225)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:318)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:293)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:552)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.NIOSender.run(NIOSender.java:140)
> > Progress: time: Wed, 12 Sep 2012 10:40:36 -0500 Selecting site:34
> > Submitted:8
> > Progress: time: Wed, 12 Sep 2012 10:40:39 -0500 Selecting site:34
> > Submitted:8
> > Progress: time: Wed, 12 Sep 2012 10:40:42 -0500 Selecting site:34
> > Submitted:8
> > ...
> >
> > Progress: time: Wed, 12 Sep 2012 10:41:42 -0500 Selecting site:34
> > Submitted:8
> > Progress: time: Wed, 12 Sep 2012 10:41:45 -0500 Selecting site:34
> > Submitted:8
> > Failed to shut down block: Block 0912-321051-000006 (8x60.000s)
> > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > Failed to cancel task. qdel returned with an exit code of 153
> > at
> >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:205)
> > at
> >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85)
> > at
> >
> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69)
> > at
> >
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:102)
> > at
> >
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:91)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:320)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.Node.errorReceived(Node.java:100)
> > at
> >
> org.globus.cog.karajan.workflow.service.commands.Command.errorReceived(Command.java:203)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:237)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:225)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:318)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:293)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:552)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.NIOSender.run(NIOSender.java:140)
> > Progress: time: Wed, 12 Sep 2012 10:41:48 -0500 Selecting site:34
> > Submitted:8
> > Progress: time: Wed, 12 Sep 2012 10:41:51 -0500 Selecting site:34
> > Submitted:8
> > Progress: time: Wed, 12 Sep 2012 10:41:54 -0500 Selecting site:34
> > Submitted:8
> > ...
> >
> >
> > ------
> >
> >
> > I understand the long lines of unchanging "Progress: ..." reports -
> > the shared queue is busy, and so I am not expecting my job to be
> > executed right away. However, I don't understand why I'm getting these
> > "failed to cancel task" errors. I gave each individual app well more
> > than enough time for it to run to completion. And while I set the
> > timelimit on the entire process to be much smaller than it needs
> > (<profile namespace="globus" key="maxTime">60</profile> in sites.xml,
> > when the process could run for days)
> > I presumed the entire process would just get shut down after 60
> > seconds of runtime. Why is this cropping up? Thanks,
> >
> >
> > Jonathan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20120912/c7b34a28/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logfiles.tar
Type: application/x-tar
Size: 1361920 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20120912/c7b34a28/attachment.tar>


More information about the Swift-user mailing list