[Swift-devel] PBS coasters miscalculate PBS options

wilde at mcs.anl.gov wilde at mcs.anl.gov
Thu Feb 25 10:00:41 CST 2010


Mihael, running a 1000 job workflow with minimal specs in the sites.xml entry for coasters on PADS gave the error "(qsub reported an exit code of 188). 
qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes" (full trace below). The sites entry was:

  <pool handle="pbs">
    <profile namespace="globus" key="maxwalltime">00:00:10</profile>
    <profile namespace="globus" key="maxtime">1800</profile>
    <execution provider="coaster" url="none" jobManager="local:pbs"/>
    <profile namespace="globus" key="workersPerNode">1</profile>
    <profile namespace="karajan" key="initialScore">10000</profile>
    <profile namespace="karajan" key="jobThrottle">5.99</profile>
    <filesystem provider="local"/>
    <workdirectory>$(pwd)</workdirectory>
  </pool>

- Mike


Swift running in SwiftR.run.056 
Swift svn swift-r3202 cog-r2683

RunID: 20100225-0813-xn3bajnc
Progress:
Progress:  uninitialized:1
Progress:  Selecting site:399  Stage in:600  Submitting:1
Progress:  Selecting site:399  Stage in:529  Submitting:2  Submitted:70
Progress:  Selecting site:399  Stage in:413  Submitted:188
Progress:  Selecting site:399  Stage in:326  Submitted:275
Worker task failed: Error submitting block task
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Cannot submit job: Could not submit job (qsub reported an exit code of 188). 
qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes

        at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
        at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
        at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:43)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
Caused by: org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could not submit job (qsub reported an exit code of 188). 
qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes

        at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:86)
        at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
        ... 3 more
Failed to shut down block
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Can only cancel an active task
        at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:149)
        at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85)
        at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:70)
        at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:96)
        at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:85)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:44)
        at org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:271)
        at org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:252)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.cleanDoneBlocks(BlockQueueProcessor.java:151)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:436)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:78)
Exception caught in block processor
java.util.ConcurrentModificationException
        at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
        at java.util.AbstractList$Itr.next(AbstractList.java:343)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.cleanDoneBlocks(BlockQueueProcessor.java:149)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:436)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:78)
Exception caught in block processor
java.util.ConcurrentModificationException
        at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
        at java.util.AbstractList$Itr.next(AbstractList.java:343)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.cleanDoneBlocks(BlockQueueProcessor.java:149)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:436)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:78)
Cleaning up...
Shutting down service at https://192.5.86.5:50002
Got channel MetaChannel: 1151109057 -> null
+Canceling job 4970.svc.pads.ci.uchicago.edu
Canceling job 4971.svc.pads.ci.uchicago.edu
Canceling job 4972.svc.pads.ci.uchicago.edu
Canceling job 4973.svc.pads.ci.uchicago.edu
Canceling job 4974.svc.pads.ci.uchicago.edu
 Done



More information about the Swift-devel mailing list