[Swift-devel] decaying number of coaster jobs leaves some tasks unfinished
Mihael Hategan
hategan at mcs.anl.gov
Mon Aug 9 16:49:30 CDT 2010
That error might be related. Can I have the full log?
On Mon, 2010-08-09 at 17:44 -0400, Glen Hocky wrote:
> Hey everyone,
> I've been trying to run some short jobs in the "fast" queue on pads.
> That means I need to keep the wall time under 1 hour, and my tasks are
> around 20 min. What's been happening, at least for a smallish number
> of jobs, is that swift decreases the number of jobs submitted to the
> queue as the number of tasks is reduced and at the end, some tasks
> remain unfinished while no jobs are in the queue, and this continues
> indefinately.
>
>
> The following is one sites entry where I reproducibly had this problem
> for 70 tasks
>
>
> <execution provider="coaster" url="none"
> jobManager="local:pbs"/>
> <!--<profile namespace="globus"
> key="queue">fast</profile>-->
> <profile namespace="globus" key="maxtime">3600</profile>
> <profile namespace="globus"
> key="maxwalltime">00:25:00</profile>
> <profile namespace="globus"
> key="workersPerNode">1</profile>
> <profile namespace="globus"
> key="internalHostname">172.5.86.5</profile>
> <profile namespace="globus" key="slots">120</profile>
> <profile namespace="globus"
> key="nodeGranularity">1</profile>
> <profile namespace="globus" key="maxNodes">1</profile>
> <profile namespace="karajan"
> key="jobThrottle">0.99</profile>
> <profile namespace="karajan"
> key="initialScore">10000</profile>
> <profile namespace="globus"
> key="project">CI-CCR000013</profile>
> <gridftp url="local://localhost" />
> <scratch>/tmp</scratch>
>
> <workdirectory>/home/hockyg/reichman/glassy_dynamics/code/swift/run/real</workdirectory>
>
>
>
>
> There are also some of this type of error
> Exception caught while unregistering channel
> org.globus.cog.karajan.workflow.service.channels.ChannelException: Trying to bind invalid channel (2027063355: {}) to 60652275: {}
> at
> org.globus.cog.karajan.workflow.service.channels.MetaChannel.bind(MetaChannel.java:67)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.unregisterChannel(ChannelManager.java:401)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.shutdownChannel(ChannelManager.java:411)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:284)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.handleChannelException(AbstractStreamKarajanChannel.java:83)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:257)
> but i'm not sure that's related...
>
>
>
>
> Running with "Swift svn swift-r3432 (swift modified locally)
> cog-r2829"
>
>
> Swift output went something like
> Progress: Submitted:69 Active:1 Finished successfully:1
> Progress: Submitted:67 Active:3 Finished successfully:1
> Progress: Submitted:66 Active:4 Finished successfully:1
> Progress: Submitted:65 Active:5 Finished successfully:1
> Progress: Submitted:64 Active:6 Finished successfully:1
> Progress: Submitted:61 Active:9 Finished successfully:1
> Progress: Submitted:58 Active:12 Finished successfully:1
> Progress: Submitted:57 Active:13 Finished successfully:1
> Progress: Submitted:54 Active:16 Finished successfully:1
> Progress: Submitted:52 Active:18 Finished successfully:1
> Progress: Submitted:51 Active:19 Finished successfully:1
> Progress: Submitted:50 Active:20 Finished successfully:1
> Progress: Submitted:49 Active:21 Finished successfully:1
> Progress: Submitted:48 Active:22 Finished successfully:1
> Progress: Submitted:41 Active:29 Finished successfully:1
> Progress: Submitted:38 Active:32 Finished successfully:1
> Progress: Submitted:37 Active:33 Finished successfully:1
> Progress: Submitted:35 Active:35 Finished successfully:1
> Progress: Submitted:31 Active:39 Finished successfully:1
> Progress: Submitted:30 Active:40 Finished successfully:1
> Progress: Submitted:26 Active:44 Finished successfully:1
> Progress: Submitted:26 Active:44 Finished successfully:1
> Progress: Submitted:26 Active:44 Finished successfully:1
> Progress: Submitted:26 Active:44 Finished successfully:1
> Progress: Submitted:26 Active:44 Finished successfully:1
> Progress: Submitted:26 Active:44 Finished successfully:1
> Progress: Submitted:26 Active:43 Checking status:1 Finished
> successfully:1
> Progress: Submitted:26 Active:43 Finished successfully:2
> Progress: Submitted:26 Active:42 Checking status:1 Finished
> successfully:2
> Progress: Submitted:25 Active:42 Checking status:1 Finished
> successfully:3
> Progress: Submitted:25 Active:41 Checking status:1 Finished
> successfully:4
> Progress: Submitted:25 Active:41 Finished successfully:5
> Progress: Submitted:25 Active:40 Checking status:1 Finished
> successfully:5
> Progress: Submitted:25 Active:39 Checking status:1 Finished
> successfully:6
> Progress: Submitted:24 Active:40 Finished successfully:7
> Progress: Submitted:24 Active:39 Checking status:1 Finished
> successfully:7
> Progress: Submitted:24 Active:38 Checking status:1 Finished
> successfully:8
> Progress: Submitted:24 Active:38 Finished successfully:9
> Progress: Submitted:24 Active:37 Checking status:1 Finished
> successfully:9
> Progress: Submitted:24 Active:35 Checking status:1 Finished
> successfully:11
> Progress: Submitted:23 Active:35 Checking status:1 Finished
> successfully:12
> Progress: Submitted:22 Active:35 Checking status:1 Finished
> successfully:13
> Progress: Submitted:22 Active:35 Finished successfully:14
> Progress: Submitted:22 Active:34 Checking status:1 Finished
> successfully:14
> Progress: Submitted:21 Active:34 Checking status:1 Finished
> successfully:15
> Progress: Submitted:21 Active:34 Finished successfully:16
> Progress: Submitted:21 Active:33 Checking status:1 Finished
> successfully:16
> Progress: Submitted:21 Active:33 Finished successfully:17
> Progress: Submitted:20 Active:32 Checking status:1 Finished
> successfully:18
> Progress: Submitted:20 Active:32 Finished successfully:19
> Progress: Submitted:20 Active:31 Checking status:1 Finished
> successfully:19
> Progress: Submitted:19 Active:31 Finished successfully:21
> Progress: Submitted:19 Active:30 Checking status:1 Finished
> successfully:21
> Progress: Submitted:18 Active:30 Checking status:1 Finished
> successfully:22
> Progress: Submitted:18 Active:29 Checking status:1 Finished
> successfully:23
> Progress: Submitted:18 Active:28 Checking status:1 Finished
> successfully:24
> Progress: Submitted:17 Active:29 Finished successfully:25
> Progress: Submitted:17 Active:29 Finished successfully:25
> Progress: Submitted:17 Active:28 Checking status:1 Finished
> successfully:25
> Progress: Submitted:17 Active:27 Checking status:1 Finished
> successfully:26
> Progress: Submitted:17 Active:26 Checking status:1 Finished
> successfully:27
> Progress: Submitted:17 Active:25 Checking status:1 Finished
> successfully:28
> Progress: Submitted:17 Active:24 Checking status:1 Finished
> successfully:29
> Progress: Submitted:16 Active:25 Finished successfully:30
> Progress: Submitted:16 Active:24 Checking status:1 Finished
> successfully:30
> Progress: Submitted:15 Active:24 Checking status:1 Finished
> successfully:31
> Progress: Submitted:15 Active:24 Finished successfully:32
> Progress: Submitted:15 Active:23 Checking status:1 Finished
> successfully:32
> Progress: Submitted:14 Active:24 Finished successfully:33
> Progress: Submitted:14 Active:23 Checking status:1 Finished
> successfully:33
> Progress: Submitted:14 Active:22 Checking status:1 Finished
> successfully:34
> Progress: Submitted:14 Active:22 Finished successfully:35
> Progress: Submitted:14 Active:21 Checking status:1 Finished
> successfully:35
> Progress: Submitted:13 Active:22 Finished successfully:36
> Progress: Submitted:13 Active:22 Finished successfully:36
> Progress: Submitted:13 Active:20 Checking status:1 Finished
> successfully:37
> Progress: Submitted:12 Active:21 Finished successfully:38
> Progress: Submitted:12 Active:20 Checking status:1 Finished
> successfully:38
> Progress: Submitted:12 Active:19 Checking status:1 Finished
> successfully:39
> Progress: Submitted:12 Active:19 Finished successfully:40
> Progress: Submitted:12 Active:18 Checking status:1 Finished
> successfully:40
> Progress: Submitted:12 Active:17 Checking status:1 Finished
> successfully:41
> Progress: Submitted:11 Active:17 Checking status:1 Finished
> successfully:42
> Progress: Submitted:11 Active:17 Finished successfully:43
> Progress: Submitted:11 Active:16 Checking status:1 Finished
> successfully:43
> Progress: Submitted:11 Active:15 Checking status:1 Finished
> successfully:44
> Progress: Submitted:10 Active:16 Finished successfully:45
> Progress: Submitted:3 Active:22 Finished successfully:46
> Progress: Submitted:3 Active:21 Checking status:1 Finished
> successfully:46
> Progress: Submitted:3 Active:19 Finished successfully:49
> Progress: Submitted:3 Active:19 Finished successfully:49
> Progress: Submitted:2 Active:20 Finished successfully:49
> Progress: Submitted:1 Active:21 Finished successfully:49
> .
> .
> .
> Progress: Submitted:1 Active:15 Finished successfully:55
> Progress: Submitted:1 Active:15 Finished successfully:55
> Progress: Submitted:1 Active:15 Finished successfully:55
> Progress: Submitted:1 Active:15 Finished successfully:55
> Progress: Submitted:1 Active:14 Checking status:1 Finished
> successfully:55
> Progress: Submitted:1 Active:14 Finished successfully:56
> Progress: Submitted:1 Active:13 Checking status:1 Finished
> successfully:56
> Progress: Submitted:1 Active:12 Checking status:1 Finished
> successfully:57
> Progress: Submitted:1 Active:12 Finished successfully:58
> Progress: Submitted:1 Active:11 Checking status:1 Finished
> successfully:58
> Progress: Submitted:1 Active:10 Checking status:1 Finished
> successfully:59
> Progress: Submitted:1 Active:10 Finished successfully:60
> Progress: Submitted:1 Active:10 Finished successfully:60
> Progress: Submitted:1 Active:10 Finished successfully:60
> Progress: Submitted:1 Active:8 Checking status:1 Finished
> successfully:61
> Progress: Submitted:1 Active:8 Finished successfully:62
> Progress: Submitted:1 Active:7 Checking status:1 Finished
> successfully:62
> Progress: Submitted:1 Active:7 Finished successfully:63
> Progress: Submitted:1 Active:6 Checking status:1 Finished
> successfully:63
> Progress: Submitted:1 Active:4 Checking status:1 Finished
> successfully:65
> Progress: Submitted:1 Active:3 Checking status:1 Finished
> successfully:66
> Progress: Submitted:1 Active:3 Finished successfully:67
> Progress: Submitted:1 Active:3 Finished successfully:67
> Progress: Submitted:1 Active:2 Checking status:1 Finished
> successfully:67
> Progress: Submitted:1 Active:2 Finished successfully:68
> Progress: Submitted:1 Active:1 Checking status:1 Finished
> successfully:68
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
> Progress: Submitted:1 Finished successfully:70
>
>
> etc
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
More information about the Swift-devel
mailing list