[Swift-user] swift returns error on pads
Zhao Zhang
zhaozhang at uchicago.edu
Sun Feb 28 10:38:29 CST 2010
Hi, Mihael
It worked. But it only worked if I set the "maxnodes" to "2", does that
mean "2" is the only number of
compute nodes that I could use on pads? Thanks.
Best
zhao
[zzhang at login2 final]$ cat pbs.xml
<config>
<pool handle="pbs">
<execution provider="coaster" url="none" jobManager="local:pbs"/>
<profile namespace="globus" key="maxwalltime">00:00:10</profile>
<profile namespace="globus" key="maxtime">1800</profile>
<profile namespace="globus" key="workersPerNode">8</profile>
<profile namespace="globus" key="maxnodes">2</profile>
<profile namespace="karajan" key="initialScore">1000</profile>
<profile namespace="karajan" key="jobThrottle">.63</profile>
<gridftp url="local://localhost" />
<workdirectory >/home/zzhang/swiftwork</workdirectory>
</pool>
</config>
Mihael Hategan wrote:
> The poor thing is allocating more nodes than are available. Specify
> maxnodes.
>
> On Sun, 2010-02-28 at 00:45 -0600, Zhao Zhang wrote:
>
>> Thanks, Allan. But I still got the same error.
>>
>> zhao
>>
>> Allan Espinosa wrote:
>>
>>> I think it should be workersPerNode
>>>
>>> On Sun, Feb 28, 2010 at 12:08:12AM -0600, Zhao Zhang wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I have a following error message from swift when I ran it on pads
>>>> with coaster. The swift
>>>> I am running is at
>>>> [zzhang at login2 final]$ which swift
>>>> /home/wilde/bigdata/swift/bin/swift
>>>>
>>>> I am confused about the configuration settings in the sites.xml
>>>> file. Could anybody point the
>>>> error in the settings? thanks.
>>>>
>>>> Best
>>>> zhao
>>>>
>>>> PS: I am pasting my pbs.xml and stdout from swift here.
>>>>
>>>> [zzhang at login2 final]$ cat pbs.xml
>>>> <config>
>>>>
>>>> <pool handle="pbs">
>>>>
>>>> <profile namespace="globus" key="maxwalltime">00:00:10</profile>
>>>> <profile namespace="globus" key="maxtime">1800</profile>
>>>>
>>>> <execution provider="coaster" url="none" jobManager="local:pbs"/>
>>>> <profile namespace="globus" key="coastersPerNode">8</profile>
>>>>
>>>> <profile namespace="karajan" key="initialScore">10000</profile>
>>>> <profile namespace="karajan" key="jobThrottle">.63</profile>
>>>>
>>>> <gridftp url="local://localhost" />
>>>> <workdirectory >/home/zzhang/swiftwork</workdirectory>
>>>>
>>>> </pool>
>>>>
>>>>
>>>>
>>>> [zzhang at login2 final]$ time swift -tc.file ./tc sites.file ./pbs.xml
>>>> movie.swift
>>>> Swift svn swift-r3255 (swift modified locally) cog-r2723
>>>>
>>>> RunID: 20100228-0003-xeza2uh4
>>>> Progress:
>>>> Progress:Progress: uninitialized:3 uninitialized:3
>>>>
>>>> Progress: Selecting site:936 Stage in:54 Submitting:1 Submitted:9
>>>> Progress: Selecting site:936 Stage in:21 Submitted:43
>>>> Worker task failed: Error submitting block task
>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>>> Cannot submit job: Could not submit job (qsub reported an exit code
>>>> of 188).
>>>> qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
>>>>
>>>> at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
>>>> at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
>>>> at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:43)
>>>> at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
>>>> Caused by:
>>>> org.globus.cog.abstraction.impl.scheduler.common.ProcessException:
>>>> Could not submit job (qsub reported an exit code of 188).
>>>> qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
>>>>
>>>> at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:100)
>>>> at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
>>>> ... 3 more
>>>> Progress: Selecting site:936 Submitted:63 Active:1
>>>> Failed to transfer wrapper log from
>>>> movie-20100228-0003-xeza2uh4/info/v on pbs
>>>> Execution failed:
>>>> Exception in transform:
>>>> Arguments: [training_set_1000/mv_0001283.txt]
>>>> Host: pbs
>>>> Directory: movie-20100228-0003-xeza2uh4/jobs/v/transform-vt0goeoj
>>>> stderr.txt:
>>>>
>>>> stdout.txt:
>>>>
>>>> ----
>>>>
>>>> Caused by:
>>>> Task failed: Error submitting block task
>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>>> Cannot submit job: Could not submit job (qsub reported an exit code
>>>> of 188).
>>>> qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
>>>>
>>>> at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
>>>> at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
>>>> at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:43)
>>>> at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
>>>> Caused by:
>>>> org.globus.cog.abstraction.impl.scheduler.common.ProcessException:
>>>> Could not submit job (qsub reported an exit code of 188).
>>>> qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
>>>>
>>>> at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:100)
>>>> at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
>>>> ... 3 more
>>>>
>>>> Cleaning up...
>>>> Shutting down service at https://192.5.86.6:50002
>>>> Got channel MetaChannel: 2090567670 -> null
>>>> +Canceling job 5536.svc.pads.ci.uchicago.edu
>>>> Canceling job 5537.svc.pads.ci.uchicago.edu
>>>> Canceling job 5538.svc.pads.ci.uchicago.edu
>>>> Failed to shut down block
>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>>> Can only cancel an active task
>>>> at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:177)
>>>> at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85)
>>>> at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:70)
>>>> at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:96)
>>>> at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:85)
>>>> at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:44)
>>>> at org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:271)
>>>> at org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:252)
>>>> at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:487)
>>>> at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:477)
>>>> at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:70)
>>>> at org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:211)
>>>> at org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28)
>>>> at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84)
>>>> at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:348)
>>>> at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.sendTaggedData(AbstractPipedChannel.java:58)
>>>> at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.sendTaggedData(AbstractKarajanChannel.java:126)
>>>> at org.globus.cog.karajan.workflow.service.commands.Command.send(Command.java:121)
>>>> at org.globus.cog.karajan.workflow.service.commands.Command.send(Command.java:171)
>>>> at org.globus.cog.karajan.workflow.service.commands.Command.executeAsync(Command.java:162)
>>>> at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager$ServiceReaper.run(ServiceManager.java:425)
>>>>
>>>>
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>
>>>
>>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>
>
>
>
More information about the Swift-user
mailing list