[Swift-devel] coasters submit jobs with "count=0" in its globus RSL params
Allan Espinosa
aespinosa at cs.uchicago.edu
Mon Jul 20 17:11:04 CDT 2009
session message:
Caused by:
Block task failed: Error submitting block task
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
Cannot submit job
at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
Caused by: org.globus.gram.GramException: The provided RSL 'count'
value is invalid (not an integer or must be greater than 0)
at org.globus.gram.Gram.request(Gram.java:358)
at org.globus.gram.GramJob.request(GramJob.java:262)
at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
... 4 more
Cleaning up...
Shutting down service at https://129.114.50.163:45035
snippet of coasters.log:
2009-07-20 17:02:02,344-0500 INFO BlockQueueProcessor
Settings {
slots = 2
workersPerNode = 16
nodeGranularity = 1
allocationStepSize = 0.1
maxNodes = 2
lowOverallocation = 10.0
highOverallocation = 1.0
overallocationDecayFactor = 0.0010
spread = 0.9
reserve = 10.000s
maxtime = 86400
project = TG-CCR080022N
queue = normal
remoteMonitorEnabled = false
}
2009-07-20 17:02:02,345-0500 INFO BlockQueueProcessor Required size:
230400 for 16 jobs
2009-07-20 17:02:02,345-0500 INFO BlockQueueProcessor h: 28800, jj:
14400, x-last: , r: 1
2009-07-20 17:02:02,345-0500 INFO BlockQueueProcessor h: 43200, w: 2,
size: 230400, msz: 230400, w*h: 86400
2009-07-20 17:02:02,355-0500 INFO BlockQueueProcessor Added: 0 - 5
2009-07-20 17:02:02,355-0500 INFO Block Starting block: workers=2,
walltime=43200.000s
2009-07-20 17:02:02,358-0500 INFO BlockTaskSubmitter Queuing block
Block 0720-010553-000000 (2x43200.000s) for submission
2009-07-20 17:02:02,359-0500 INFO BlockQueueProcessor Added 6 jobs to
new blocks
2009-07-20 17:02:02,359-0500 INFO BlockQueueProcessor Plan time: 55
2009-07-20 17:02:02,359-0500 INFO BlockTaskSubmitter Submitting block
Block 0720-010553-000000 (2x43200.000s)
2009-07-20 17:02:02,379-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1248127320448) setting status to Submitting
2009-07-20 17:02:02,381-0500 INFO Block Block task status changed: Submitting
---end--
with w=2, count = 2 / 16 = 0 when a Block is instantiated.
sites.xml:
<config>
<pool handle="RANGER" >
<execution provider="coaster"
url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
<profile namespace="globus" key="project">TG-CCR080022N</profile>
<profile namespace="globus" key="workersPerNode">16</profile>
<profile namespace="globus" key="queue">normal</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<profile namespace="karajan" key="jobThrottle">0.32</profile>
<profile namespace="globus" key="slots">2</profile>
<profile namespace="globus" key="maxNodes">2</profile>
<profile namespace="globus" key="maxwalltime">4:00:00</profile>
<profile namespace="globus" key="maxtime">86400</profile>
<filesystem provider="coaster"
url="gt2://gatekeeper.ranger.tacc.teragrid.org" />
<workdirectory >/scratch/01035/tg802895/see_runs</workdirectory>
</pool>
</config>
obviously i need to get the right mix of overAllocation parameters.
but an invalid RSL entry should at least be caught.
I'll try to understand better BlockQueueProcessor.allocateBlocks to
have at least an intelligent guess on what these values should be.
--
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
More information about the Swift-devel
mailing list