[Swift-devel] coasters submit jobs with "count=0" in its globus RSL params

Allan Espinosa aespinosa at cs.uchicago.edu
Mon Jul 20 17:11:04 CDT 2009


session message:
Caused by:
        Block task failed: Error submitting block task
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
Cannot submit job
        at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
        at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
        at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
        at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
        at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
Caused by: org.globus.gram.GramException: The provided RSL 'count'
value is invalid (not an integer or must be greater than 0)
        at org.globus.gram.Gram.request(Gram.java:358)
        at org.globus.gram.GramJob.request(GramJob.java:262)
        at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
        ... 4 more

Cleaning up...
Shutting down service at https://129.114.50.163:45035

snippet of coasters.log:
2009-07-20 17:02:02,344-0500 INFO  BlockQueueProcessor
Settings {
        slots = 2
        workersPerNode = 16
        nodeGranularity = 1
        allocationStepSize = 0.1
        maxNodes = 2
        lowOverallocation = 10.0
        highOverallocation = 1.0
        overallocationDecayFactor = 0.0010
        spread = 0.9
        reserve = 10.000s
        maxtime = 86400
        project = TG-CCR080022N
        queue = normal
        remoteMonitorEnabled = false
}

2009-07-20 17:02:02,345-0500 INFO  BlockQueueProcessor Required size:
230400 for 16 jobs
2009-07-20 17:02:02,345-0500 INFO  BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 1
2009-07-20 17:02:02,345-0500 INFO  BlockQueueProcessor h: 43200, w: 2,
size: 230400, msz: 230400, w*h: 86400
2009-07-20 17:02:02,355-0500 INFO  BlockQueueProcessor Added: 0 - 5
2009-07-20 17:02:02,355-0500 INFO  Block Starting block: workers=2,
walltime=43200.000s
2009-07-20 17:02:02,358-0500 INFO  BlockTaskSubmitter Queuing block
Block 0720-010553-000000 (2x43200.000s) for submission
2009-07-20 17:02:02,359-0500 INFO  BlockQueueProcessor Added 6 jobs to
new blocks
2009-07-20 17:02:02,359-0500 INFO  BlockQueueProcessor Plan time: 55
2009-07-20 17:02:02,359-0500 INFO  BlockTaskSubmitter Submitting block
Block 0720-010553-000000 (2x43200.000s)
2009-07-20 17:02:02,379-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248127320448) setting status to Submitting
2009-07-20 17:02:02,381-0500 INFO  Block Block task status changed: Submitting
---end--

with w=2, count = 2 / 16 = 0 when a Block is instantiated.

sites.xml:
<config>
  <pool handle="RANGER" >
    <execution  provider="coaster"
url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
    <profile namespace="globus" key="project">TG-CCR080022N</profile>
    <profile namespace="globus" key="workersPerNode">16</profile>
    <profile namespace="globus" key="queue">normal</profile>
    <profile namespace="karajan" key="initialScore">10000</profile>
    <profile namespace="karajan" key="jobThrottle">0.32</profile>
    <profile namespace="globus" key="slots">2</profile>
    <profile namespace="globus" key="maxNodes">2</profile>
    <profile namespace="globus" key="maxwalltime">4:00:00</profile>
    <profile namespace="globus" key="maxtime">86400</profile>

    <filesystem provider="coaster"
      url="gt2://gatekeeper.ranger.tacc.teragrid.org" />
    <workdirectory >/scratch/01035/tg802895/see_runs</workdirectory>
  </pool>
</config>

obviously i need to get the right mix of overAllocation parameters.
but an invalid RSL entry should at least be caught.

 I'll try to understand better BlockQueueProcessor.allocateBlocks to
have at least an intelligent guess on what  these values should be.


-- 
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>



More information about the Swift-devel mailing list