[Swift-devel] a note on stress testing
Michael Wilde
wilde at mcs.anl.gov
Sat Mar 9 17:12:58 CST 2013
Yadu, below is exactly the kind of error Im hoping we can catch in the test suite.
The one below is happening on remote submissions from midway to beagle using coaster provider staging of 17MB input files.
So it might need both site-config and stress testing concurrently, to detect.
- Mike
----- Forwarded Message -----
From: "Michael Wilde" <wilde at mcs.anl.gov>
To: "Mihael Hategan" <hategan at mcs.anl.gov>
Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
Sent: Saturday, March 9, 2013 5:09:16 PM
Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle
See instead run028. Errors below.
Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
RunID: 20130309-2252-x37dmuy0
Progress: time: Sat, 09 Mar 2013 22:52:24 +0000
Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 Submitting:47 Submitted:1
Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 Stage in:1 Submitted:47
Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 Stage in:48
Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 Stage in:48
Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 Stage in:48
Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 Stage in:48
Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 Stage in:47 Active:1
Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 Stage in:42 Active:6
Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 Stage in:24 Active:24
Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 Active:47 Stage out:1
Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 Stage in:2 Submitted:1 Active:44 Stage out:1 Finished successfully:3
Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 Finished successfully:8
Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 Stage in:12 Submitting:3 Active:24 Stage out:8 Finished successfully:16
Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 Stage in:23 Submitting:5 Active:15 Stage out:4 Finished successfully:29
Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 Stage in:28 Submitting:7 Stage out:12 Finished successfully:36
Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 Stage in:35 Submitting:12 Submitted:1 Finished successfully:48
Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 Stage in:48 Finished successfully:48
Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 Stage in:48 Finished successfully:48
Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48
Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48
Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 Stage in:47 Finished successfully:49
Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49
Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 Stage in:48 Finished successfully:49
Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49
Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 Stage in:47 Stage out:1 Finished successfully:49
Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
Context: service-60231
Meta context: service-60121
Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 Stage in:47 Finished successfully:50
Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 Stage in:47 Submitted:1 Finished successfully:50
Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
Context: service-60507
Meta context: service-60121
Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 Stage in:47 Active:1 Finished successfully:50
Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
Context: service-60742
Meta context: service-60121
Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 Stage in:46 Active:2 Finished successfully:50
Execution failed:
Exception in getlanduse:
Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb]
Host: beagle
Directory: modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l
Caused by:
Shutting down worker
getLandUse, modis02.swift, line 20
Attempted to unregister unregistered handler with id 526
Attempted to unregister unregistered handler with id 534
Attempted to unregister unregistered handler with id 430
Attempted to unregister unregistered handler with id 476
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 337
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267)
at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240)
at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228)
at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57)
at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 348
Attempted to unregister unregistered handler with id 466
Attempted to unregister unregistered handler with id 347
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 456
Attempted to unregister unregistered handler with id 454
Attempted to unregister unregistered handler with id 508
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 511
Attempted to unregister unregistered handler with id 506
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 380
Attempted to unregister unregistered handler with id 502
Attempted to unregister unregistered handler with id 376
Attempted to unregister unregistered handler with id 226
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Failed to abort transfer
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.next(LinkedList.java:886)
at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 484
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-66-1-1-1362869544093)
Task being removed twice?
java.lang.Throwable
at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.removeTask(AbstractGridNode.java:291)
at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:263)
at org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:136)
at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168)
at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:665)
at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:428)
at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:426)
at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240)
at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228)
at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.errorReceived(JobSubmissionTaskHandler.java:219)
at org.globus.cog.karajan.workflow.service.commands.Command.errorReceived(Command.java:191)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:227)
at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-244-1-1-1362869544091)
Ex098
java.lang.NullPointerException
at org.globus.cog.karajan.arguments.NamedArgumentsImpl.merge(NamedArgumentsImpl.java:52)
at org.globus.cog.karajan.workflow.nodes.SequentialChoice.commitBuffers(SequentialChoice.java:46)
at org.globus.cog.karajan.workflow.nodes.SequentialChoice.completed(SequentialChoice.java:40)
at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29)
at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20)
at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139)
at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98)
at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Could not fail element
Attempted to close nonexistent channel buffers
at org.globus.cog.karajan.arguments.ArgUtil.closeBuffers(ArgUtil.java:279)
at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.closeBuffers(AbstractParallelIterator.java:107)
at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.failed(AbstractParallelIterator.java:143)
at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:89)
at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:151)
at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98)
at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-84-1-1-1362869544098)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-64-1-1-1362869544095)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-45-1-1-1362869544101)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-79-1-1-1362869544099)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-68-1-1-1362869544108)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-63-1-1-1362869544103)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-256-1-1-1362869544077)
error null
error null
real 4m27.856s
user 2m45.576s
sys 0m3.697s
+ mv /home/wilde/.swift/runs/current/run028.1362869541 /home/wilde/.swift/runs/completed
midway001$
----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Mihael Hategan" <hategan at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Saturday, March 9, 2013 5:05:25 PM
> Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle
>
> Mihael, now I think I have a coaster problem. Curiously, it always
> seems to happen at about 5 mins into the run.
>
> Logs for these runs are on midway in eg
> /home/wilde/osgdemo/modis/svn/run027
>
> leading portion of error from stdout/err is below.
>
> - Mike
>
> Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
>
> RunID: 20130309-2252-x37dmuy0
> Progress: time: Sat, 09 Mar 2013 22:52:24 +0000
> Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269
> Submitting:47 Submitted:1
> Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269
> Stage in:1 Submitted:47
> Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269
> Stage in:48
> Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269
> Stage in:48
> Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269
> Stage in:48
> Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269
> Stage in:48
> Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269
> Stage in:47 Active:1
> Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269
> Stage in:42 Active:6
> Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269
> Stage in:24 Active:24
> Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269
> Active:47 Stage out:1
> Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266
> Stage in:2 Submitted:1 Active:44 Stage out:1 Finished
> successfully:3
> Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261
> Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3
> Finished successfully:8
> Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254
> Stage in:12 Submitting:3 Active:24 Stage out:8 Finished
> successfully:16
> Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241
> Stage in:23 Submitting:5 Active:15 Stage out:4 Finished
> successfully:29
> Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234
> Stage in:28 Submitting:7 Stage out:12 Finished successfully:36
> Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221
> Stage in:35 Submitting:12 Submitted:1 Finished successfully:48
> Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221
> Stage in:48 Finished successfully:48
> Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221
> Stage in:48 Finished successfully:48
> Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221
> Stage in:47 Active:1 Finished successfully:48
> Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221
> Stage in:47 Stage out:1 Finished successfully:48
> Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221
> Stage in:47 Finished successfully:49
> Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220
> Stage in:47 Submitted:1 Finished successfully:49
> Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220
> Stage in:48 Finished successfully:49
> Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220
> Stage in:47 Active:1 Finished successfully:49
> Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220
> Stage in:47 Stage out:1 Finished successfully:49
> Channels:
> {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> Context: service-60231
> Meta context: service-60121
> Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220
> Stage in:47 Finished successfully:50
> Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219
> Stage in:47 Submitted:1 Finished successfully:50
> Channels:
> {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> Context: service-60507
> Meta context: service-60121
> Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219
> Stage in:47 Active:1 Finished successfully:50
> Channels:
> {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> Context: service-60742
> Meta context: service-60121
> Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219
> Stage in:46 Active:2 Finished successfully:50
> Execution failed:
> Exception in getlanduse:
> Arguments:
> [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb]
> Host: beagle
> Directory:
> modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l
>
> Caused by:
> Shutting down worker
> getLandUse, modis02.swift, line 20
> Attempted to unregister unregistered handler with id 526
> Attempted to unregister unregistered handler with id 534
> Attempted to unregister unregistered handler with id 430
> Attempted to unregister unregistered handler with id 476
> Failed to abort transfer
> java.util.ConcurrentModificationException
> at
> java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> at
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> at
> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> at
> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> at
> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> at
> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> at
> org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Attempted to unregister unregistered handler with id 337
> Failed to abort transfer
> java.util.ConcurrentModificationException
> at
> java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> at
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> at
> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> at
> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> at
> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> at
> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> at
> org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267)
> at
> org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240)
> at
> org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228)
> at
> org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57)
> at
> org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Failed to abort transfer
> java.util.ConcurrentModificationException
> at
> java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> at
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> at
> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> at
> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> at
> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> at
> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226
>
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Saturday, March 9, 2013 4:24:17 PM
> > Subject: Re: [Swift-devel] Cant get auto-coasters to run from
> > midway to beagle
> >
> > I forgot to paste the error, sorry. Its below now, fer real. When
> > I
> > dial down the throttle to 48 and only start 2 beagle nodes, I get
> > further and the app calls make it to active state. The 317 files
> > being staged in here are 17MB each.
> >
> > The swift progress output and error are below:
> >
> > RunID: 20130309-2204-qu9ck076
> > Progress: time: Sat, 09 Mar 2013 22:04:34 +0000
> > Progress: time: Sat, 09 Mar 2013 22:04:45 +0000 Submitting:316
> > Submitted:1
> > Progress: time: Sat, 09 Mar 2013 22:04:51 +0000 Stage in:1
> > Submitted:316
> > Progress: time: Sat, 09 Mar 2013 22:04:52 +0000 Stage in:25
> > Submitted:292
> > Progress: time: Sat, 09 Mar 2013 22:04:53 +0000 Stage in:68
> > Submitted:249
> > Progress: time: Sat, 09 Mar 2013 22:04:55 +0000 Stage in:113
> > Submitted:204
> > Progress: time: Sat, 09 Mar 2013 22:04:56 +0000 Stage in:165
> > Submitted:152
> > Progress: time: Sat, 09 Mar 2013 22:04:58 +0000 Stage in:177
> > Submitted:140
> > Progress: time: Sat, 09 Mar 2013 22:05:00 +0000 Stage in:225
> > Submitted:92
> > Progress: time: Sat, 09 Mar 2013 22:05:04 +0000 Stage in:241
> > Submitted:76
> > Progress: time: Sat, 09 Mar 2013 22:05:05 +0000 Stage in:289
> > Submitted:28
> > Progress: time: Sat, 09 Mar 2013 22:05:09 +0000 Stage in:305
> > Submitted:12
> > Progress: time: Sat, 09 Mar 2013 22:05:34 +0000 Stage in:317
> > Progress: time: Sat, 09 Mar 2013 22:06:04 +0000 Stage in:317
> > Progress: time: Sat, 09 Mar 2013 22:06:34 +0000 Stage in:317
> > Progress: time: Sat, 09 Mar 2013 22:07:04 +0000 Stage in:317
> > Progress: time: Sat, 09 Mar 2013 22:07:34 +0000 Stage in:317
> > Progress: time: Sat, 09 Mar 2013 22:08:04 +0000 Stage in:317
> > Progress: time: Sat, 09 Mar 2013 22:08:34 +0000 Stage in:317
> > Channels:
> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640]
> > -> BufferingChannel,
> > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640]
> > -> BufferingChannel}
> > Context: service-60822
> > Meta context: service-60640
> > Channels:
> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640]
> > -> BufferingChannel,
> > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640]
> > -> BufferingChannel}
> > Context: service-60116
> > Meta context: service-60640
> > Channels:
> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640]
> > -> BufferingChannel,
> > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640]
> > -> BufferingChannel}
> > Context: service-60598
> > Meta context: service-60640
> > Progress: time: Sat, 09 Mar 2013 22:09:04 +0000 Stage in:317
> > Progress: time: Sat, 09 Mar 2013 22:09:08 +0000 Stage in:316
> > Active:1
> > Execution failed:
> > Exception in getlanduse:
> > Arguments:
> > [home/wilde/osgdemo/modis/svn/data/modis/2002/h15v02.rgb]
> > Host: beagle
> > Directory:
> > modis02-20130309-2204-qu9ck076/jobs/b/getlanduse-bmscjd6l
> >
> > Caused by:
> > Shutting down worker
> > getLandUse, modis02.swift, line 20
> > error null
> >
> > real 4m36.777s
> > user 2m55.240s
> > sys 0m3.837s
> >
> >
> > ---
> >
> > With a throttle of 48 (.47) and 2 beagle nodes, I see:
> >
> > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
> >
> > RunID: 20130309-2214-1oi3rvea
> > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000
> > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting
> > site:269
> > Submitting:47 Submitted:1
> > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting
> > site:269
> > Stage in:1 Submitted:47
> > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting
> > site:269
> > Stage in:25 Submitted:23
> > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting
> > site:269
> > Stage in:48
> > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting
> > site:269
> > Stage in:48
> > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting
> > site:269
> > Stage in:48
> > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting
> > site:269
> > Stage in:48
> > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting
> > site:269
> > Stage in:47 Active:1
> > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting
> > site:269
> > Stage in:36 Active:12
> > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting
> > site:269
> > Stage in:24 Active:24
> > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting
> > site:269
> > Stage in:24 Active:23 Stage out:1
> > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting
> > site:269
> > Stage in:14 Active:33 Stage out:1
> > Execution failed:
> > Exception in getlanduse:
> > Arguments:
> > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb]
> > Host: beagle
> > Directory:
> > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l
> >
> > Caused by:
> > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed
> > with an exit code of 1
> > getLandUse, modis02.swift, line 20
> >
> > real 2m31.463s
> > user 1m33.238s
> > sys 0m2.160s
> > + mv /home/wilde/.swift/runs/current/run024.1362867244
> > /home/wilde/.swift/runs/completed
> >
> > This error is likely in the demo app code; just pasting here to
> > show
> > that with less concurrency it makes progress.
> >
> > ----- Original Message -----
> > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > Sent: Saturday, March 9, 2013 4:11:24 PM
> > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from
> > > midway to beagle
> > >
> > > Now Im getting the error below (from running 317 simple MODIS
> > > apps
> > > concurrently). Im going to dial down the throttle first to see
> > > if
> > > the staging load is overwhelming either coasters or the
> > > midway-beagle path.
> > >
> > > - Mike
> > >
> > >
> > > ----- Original Message -----
> > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > > Sent: Saturday, March 9, 2013 3:59:22 PM
> > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from
> > > > midway to beagle
> > > >
> > > > I think we just got this working. Problems may have included
> > > > the
> > > > need
> > > > to pre-create the workdirectory and to specify a dotted IP
> > > > address
> > > > on the external network for GLOBUS_HOSTNAME. Will need to
> > > > experiment. So likely that proxy expiration time was not a
> > > > problem
> > > > (although its confusing).
> > > >
> > > > Will report back on this once the needed steps are clear.
> > > >
> > > > Thanks,
> > > >
> > > > - Mike
> > > >
> > > > ----- Original Message -----
> > > > > From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > > > Sent: Saturday, March 9, 2013 3:56:36 PM
> > > > > Subject: Re: Cant get auto-coasters to run from midway to
> > > > > beagle
> > > > >
> > > > > Can you post ,globus/coasters/coaster.log from beagle?
> > > > >
> > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote:
> > > > > > Mihael, can you advise on this problem?
> > > > > >
> > > > > > David and I are trying to run automatic coaster jobs from
> > > > > > midway
> > > > > > login hosts and swift.rcc to beagle using ssh-cl:pbs.
> > > > > >
> > > > > > My failed attempts are on midway under
> > > > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has
> > > > > > complete
> > > > > > logs).
> > > > > >
> > > > > > Quick question about the proxy files that get copied. Does
> > > > > > this
> > > > > > look OK? :
> > > > > >
> > > > > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking
> > > > > > certificate
> > > > > > /home/wilde/.globus/coasters/proxy.0.pem
> > > > > > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate
> > > > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration
> > > > > > date
> > > > > > Sat
> > > > > > Mar 23\
> > > > > > 19:25:53 GMT 2013
> > > > > >
> > > > > > The proxy expiration time listed above is two hours
> > > > > > *earlier*
> > > > > > than
> > > > > > the current time (as seen in the message's UTC timestamp).
> > > > > > Is
> > > > > > that correct, or a possible cause of this problem?
> > > > > >
> > > > > > The main symptom seems to be this:
> > > > > >
> > > > > > Execution failed:
> > > > > > Exception in getlanduse:
> > > > > > Arguments: [../data/modis/2002/h00v09.rgb]
> > > > > > Host: beagle
> > > > > > Directory:
> > > > > > modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l
> > > > > >
> > > > > > Caused by:
> > > > > > Could not submit job
> > > > > > Caused by:
> > > > > > Could not start coaster service
> > > > > > Caused by:
> > > > > > Task ended before registration was received.
> > > > > > Failed to download bootstrap jar from
> > > > > > http://midway001.rcc.uchicago.edu:50001
> > > > > > ---
> > > > > >
> > > > > > Yet Ive verified that midway login4 (which is the target
> > > > > > system)
> > > > > > can connect to this hostname and port (with nc -l and
> > > > > > telnet)
> > > > > >
> > > > > > - Mike
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
More information about the Swift-devel
mailing list