[Swift-devel] coasters submit jobs with "count=0" in its globus RSL params

Michael Wilde wilde at mcs.anl.gov
Mon Jul 20 17:18:31 CDT 2009


Sarah, is this the same error you have been getting? (Invalid RSL count 
field?)

- Mike

On 7/20/09 5:11 PM, Allan Espinosa wrote:
> session message:
> Caused by:
>         Block task failed: Error submitting block task
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Cannot submit job
>         at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
>         at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
>         at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
>         at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
>         at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:66)
> Caused by: org.globus.gram.GramException: The provided RSL 'count'
> value is invalid (not an integer or must be greater than 0)
>         at org.globus.gram.Gram.request(Gram.java:358)
>         at org.globus.gram.GramJob.request(GramJob.java:262)
>         at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
>         ... 4 more
> 
> Cleaning up...
> Shutting down service at https://129.114.50.163:45035
> 
> snippet of coasters.log:
> 2009-07-20 17:02:02,344-0500 INFO  BlockQueueProcessor
> Settings {
>         slots = 2
>         workersPerNode = 16
>         nodeGranularity = 1
>         allocationStepSize = 0.1
>         maxNodes = 2
>         lowOverallocation = 10.0
>         highOverallocation = 1.0
>         overallocationDecayFactor = 0.0010
>         spread = 0.9
>         reserve = 10.000s
>         maxtime = 86400
>         project = TG-CCR080022N
>         queue = normal
>         remoteMonitorEnabled = false
> }
> 
> 2009-07-20 17:02:02,345-0500 INFO  BlockQueueProcessor Required size:
> 230400 for 16 jobs
> 2009-07-20 17:02:02,345-0500 INFO  BlockQueueProcessor h: 28800, jj:
> 14400, x-last: , r: 1
> 2009-07-20 17:02:02,345-0500 INFO  BlockQueueProcessor h: 43200, w: 2,
> size: 230400, msz: 230400, w*h: 86400
> 2009-07-20 17:02:02,355-0500 INFO  BlockQueueProcessor Added: 0 - 5
> 2009-07-20 17:02:02,355-0500 INFO  Block Starting block: workers=2,
> walltime=43200.000s
> 2009-07-20 17:02:02,358-0500 INFO  BlockTaskSubmitter Queuing block
> Block 0720-010553-000000 (2x43200.000s) for submission
> 2009-07-20 17:02:02,359-0500 INFO  BlockQueueProcessor Added 6 jobs to
> new blocks
> 2009-07-20 17:02:02,359-0500 INFO  BlockQueueProcessor Plan time: 55
> 2009-07-20 17:02:02,359-0500 INFO  BlockTaskSubmitter Submitting block
> Block 0720-010553-000000 (2x43200.000s)
> 2009-07-20 17:02:02,379-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
> identity=urn:cog-1248127320448) setting status to Submitting
> 2009-07-20 17:02:02,381-0500 INFO  Block Block task status changed: Submitting
> ---end--
> 
> with w=2, count = 2 / 16 = 0 when a Block is instantiated.
> 
> sites.xml:
> <config>
>   <pool handle="RANGER" >
>     <execution  provider="coaster"
> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
>     <profile namespace="globus" key="project">TG-CCR080022N</profile>
>     <profile namespace="globus" key="workersPerNode">16</profile>
>     <profile namespace="globus" key="queue">normal</profile>
>     <profile namespace="karajan" key="initialScore">10000</profile>
>     <profile namespace="karajan" key="jobThrottle">0.32</profile>
>     <profile namespace="globus" key="slots">2</profile>
>     <profile namespace="globus" key="maxNodes">2</profile>
>     <profile namespace="globus" key="maxwalltime">4:00:00</profile>
>     <profile namespace="globus" key="maxtime">86400</profile>
> 
>     <filesystem provider="coaster"
>       url="gt2://gatekeeper.ranger.tacc.teragrid.org" />
>     <workdirectory >/scratch/01035/tg802895/see_runs</workdirectory>
>   </pool>
> </config>
> 
> obviously i need to get the right mix of overAllocation parameters.
> but an invalid RSL entry should at least be caught.
> 
>  I'll try to understand better BlockQueueProcessor.allocateBlocks to
> have at least an intelligent guess on what  these values should be.
> 
> 



More information about the Swift-devel mailing list