[Swift-devel] Cant get auto-coasters to run from midway to beagle

Mihael Hategan hategan at mcs.anl.gov
Sat Mar 9 17:11:40 CST 2013


Got it. I'll look a bit later. Right now I'm working on Lorenzo's stuff.

Mihael

On Sat, 2013-03-09 at 17:09 -0600, Michael Wilde wrote:
> See instead run028.  Errors below.
> 
> Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
> 
> RunID: 20130309-2252-x37dmuy0
> Progress:  time: Sat, 09 Mar 2013 22:52:24 +0000
> Progress:  time: Sat, 09 Mar 2013 22:52:34 +0000  Selecting site:269  Submitting:47  Submitted:1
> Progress:  time: Sat, 09 Mar 2013 22:52:42 +0000  Selecting site:269  Stage in:1  Submitted:47
> Progress:  time: Sat, 09 Mar 2013 22:52:54 +0000  Selecting site:269  Stage in:48
> Progress:  time: Sat, 09 Mar 2013 22:53:24 +0000  Selecting site:269  Stage in:48
> Progress:  time: Sat, 09 Mar 2013 22:53:54 +0000  Selecting site:269  Stage in:48
> Progress:  time: Sat, 09 Mar 2013 22:54:24 +0000  Selecting site:269  Stage in:48
> Progress:  time: Sat, 09 Mar 2013 22:54:51 +0000  Selecting site:269  Stage in:47  Active:1
> Progress:  time: Sat, 09 Mar 2013 22:54:52 +0000  Selecting site:269  Stage in:42  Active:6
> Progress:  time: Sat, 09 Mar 2013 22:54:54 +0000  Selecting site:269  Stage in:24  Active:24
> Progress:  time: Sat, 09 Mar 2013 22:54:57 +0000  Selecting site:269  Active:47  Stage out:1
> Progress:  time: Sat, 09 Mar 2013 22:54:58 +0000  Selecting site:266  Stage in:2  Submitted:1  Active:44  Stage out:1  Finished successfully:3
> Progress:  time: Sat, 09 Mar 2013 22:55:00 +0000  Selecting site:261  Stage in:5  Submitting:2  Submitted:1  Active:37  Stage out:3  Finished successfully:8
> Progress:  time: Sat, 09 Mar 2013 22:55:01 +0000  Selecting site:254  Stage in:12  Submitting:3  Active:24  Stage out:8  Finished successfully:16
> Progress:  time: Sat, 09 Mar 2013 22:55:02 +0000  Selecting site:241  Stage in:23  Submitting:5  Active:15  Stage out:4  Finished successfully:29
> Progress:  time: Sat, 09 Mar 2013 22:55:03 +0000  Selecting site:234  Stage in:28  Submitting:7  Stage out:12  Finished successfully:36
> Progress:  time: Sat, 09 Mar 2013 22:55:04 +0000  Selecting site:221  Stage in:35  Submitting:12  Submitted:1  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:55:24 +0000  Selecting site:221  Stage in:48  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:55:54 +0000  Selecting site:221  Stage in:48  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:56:08 +0000  Selecting site:221  Stage in:47  Active:1  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:56:14 +0000  Selecting site:221  Stage in:47  Stage out:1  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:56:16 +0000  Selecting site:221  Stage in:47  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 22:56:19 +0000  Selecting site:220  Stage in:47  Submitted:1  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 22:56:24 +0000  Selecting site:220  Stage in:48  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 22:56:29 +0000  Selecting site:220  Stage in:47  Active:1  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 22:56:35 +0000  Selecting site:220  Stage in:47  Stage out:1  Finished successfully:49
> Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> Context: service-60231
> Meta context: service-60121
> Progress:  time: Sat, 09 Mar 2013 22:56:37 +0000  Selecting site:220  Stage in:47  Finished successfully:50
> Progress:  time: Sat, 09 Mar 2013 22:56:40 +0000  Selecting site:219  Stage in:47  Submitted:1  Finished successfully:50
> Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> Context: service-60507
> Meta context: service-60121
> Progress:  time: Sat, 09 Mar 2013 22:56:44 +0000  Selecting site:219  Stage in:47  Active:1  Finished successfully:50
> Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> Context: service-60742
> Meta context: service-60121
> Progress:  time: Sat, 09 Mar 2013 22:56:48 +0000  Selecting site:219  Stage in:46  Active:2  Finished successfully:50
> Execution failed:
> 	Exception in getlanduse:
>     Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb]
>     Host: beagle
>     Directory: modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l
> 
> Caused by:
> 	Shutting down worker
> 	getLandUse, modis02.swift, line 20
> Attempted to unregister unregistered handler with id 526
> Attempted to unregister unregistered handler with id 534
> Attempted to unregister unregistered handler with id 430
> Attempted to unregister unregistered handler with id 476
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Attempted to unregister unregistered handler with id 337
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267)
> 	at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240)
> 	at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228)
> 	at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57)
> 	at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Attempted to unregister unregistered handler with id 348
> Attempted to unregister unregistered handler with id 466
> Attempted to unregister unregistered handler with id 347
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Attempted to unregister unregistered handler with id 456
> Attempted to unregister unregistered handler with id 454
> Attempted to unregister unregistered handler with id 508
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Attempted to unregister unregistered handler with id 511
> Attempted to unregister unregistered handler with id 506
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Attempted to unregister unregistered handler with id 380
> Attempted to unregister unregistered handler with id 502
> Attempted to unregister unregistered handler with id 376
> Attempted to unregister unregistered handler with id 226
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Attempted to unregister unregistered handler with id 484
> Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-66-1-1-1362869544093)
> Task being removed twice?
> java.lang.Throwable
> 	at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.removeTask(AbstractGridNode.java:291)
> 	at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:263)
> 	at org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:136)
> 	at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168)
> 	at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:665)
> 	at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:428)
> 	at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:426)
> 	at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240)
> 	at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228)
> 	at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.errorReceived(JobSubmissionTaskHandler.java:219)
> 	at org.globus.cog.karajan.workflow.service.commands.Command.errorReceived(Command.java:191)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:227)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-244-1-1-1362869544091)
> Ex098
> java.lang.NullPointerException
> 	at org.globus.cog.karajan.arguments.NamedArgumentsImpl.merge(NamedArgumentsImpl.java:52)
> 	at org.globus.cog.karajan.workflow.nodes.SequentialChoice.commitBuffers(SequentialChoice.java:46)
> 	at org.globus.cog.karajan.workflow.nodes.SequentialChoice.completed(SequentialChoice.java:40)
> 	at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
> 	at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
> 	at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
> 	at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
> 	at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
> 	at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
> 	at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
> 	at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29)
> 	at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20)
> 	at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
> 	at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139)
> 	at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
> 	at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98)
> 	at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> Could not fail element
> Attempted to close nonexistent channel buffers
> 
> 	at org.globus.cog.karajan.arguments.ArgUtil.closeBuffers(ArgUtil.java:279)
> 	at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.closeBuffers(AbstractParallelIterator.java:107)
> 	at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.failed(AbstractParallelIterator.java:143)
> 	at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:89)
> 	at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:151)
> 	at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
> 	at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98)
> 	at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-84-1-1-1362869544098)
> Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-64-1-1-1362869544095)
> Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-45-1-1-1362869544101)
> Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-79-1-1-1362869544099)
> Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-68-1-1-1362869544108)
> Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-63-1-1-1362869544103)
> Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-256-1-1-1362869544077)
> error null
> error null
> 
> real	4m27.856s
> user	2m45.576s
> sys	0m3.697s
> + mv /home/wilde/.swift/runs/current/run028.1362869541 /home/wilde/.swift/runs/completed
> midway001$ 
> 
> 
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Saturday, March 9, 2013 5:05:25 PM
> > Subject: Re: [Swift-devel] Cant get auto-coasters to run	from	midway	to	beagle
> > 
> > Mihael, now I think I have a coaster problem. Curiously, it always
> > seems to happen at about 5 mins into the run.
> > 
> > Logs for these runs are on midway in eg
> > /home/wilde/osgdemo/modis/svn/run027
> > 
> > leading portion of error from stdout/err is below.
> > 
> > - Mike
> > 
> > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
> > 
> > RunID: 20130309-2252-x37dmuy0
> > Progress:  time: Sat, 09 Mar 2013 22:52:24 +0000
> > Progress:  time: Sat, 09 Mar 2013 22:52:34 +0000  Selecting site:269
> >  Submitting:47  Submitted:1
> > Progress:  time: Sat, 09 Mar 2013 22:52:42 +0000  Selecting site:269
> >  Stage in:1  Submitted:47
> > Progress:  time: Sat, 09 Mar 2013 22:52:54 +0000  Selecting site:269
> >  Stage in:48
> > Progress:  time: Sat, 09 Mar 2013 22:53:24 +0000  Selecting site:269
> >  Stage in:48
> > Progress:  time: Sat, 09 Mar 2013 22:53:54 +0000  Selecting site:269
> >  Stage in:48
> > Progress:  time: Sat, 09 Mar 2013 22:54:24 +0000  Selecting site:269
> >  Stage in:48
> > Progress:  time: Sat, 09 Mar 2013 22:54:51 +0000  Selecting site:269
> >  Stage in:47  Active:1
> > Progress:  time: Sat, 09 Mar 2013 22:54:52 +0000  Selecting site:269
> >  Stage in:42  Active:6
> > Progress:  time: Sat, 09 Mar 2013 22:54:54 +0000  Selecting site:269
> >  Stage in:24  Active:24
> > Progress:  time: Sat, 09 Mar 2013 22:54:57 +0000  Selecting site:269
> >  Active:47  Stage out:1
> > Progress:  time: Sat, 09 Mar 2013 22:54:58 +0000  Selecting site:266
> >  Stage in:2  Submitted:1  Active:44  Stage out:1  Finished
> > successfully:3
> > Progress:  time: Sat, 09 Mar 2013 22:55:00 +0000  Selecting site:261
> >  Stage in:5  Submitting:2  Submitted:1  Active:37  Stage out:3
> >  Finished successfully:8
> > Progress:  time: Sat, 09 Mar 2013 22:55:01 +0000  Selecting site:254
> >  Stage in:12  Submitting:3  Active:24  Stage out:8  Finished
> > successfully:16
> > Progress:  time: Sat, 09 Mar 2013 22:55:02 +0000  Selecting site:241
> >  Stage in:23  Submitting:5  Active:15  Stage out:4  Finished
> > successfully:29
> > Progress:  time: Sat, 09 Mar 2013 22:55:03 +0000  Selecting site:234
> >  Stage in:28  Submitting:7  Stage out:12  Finished successfully:36
> > Progress:  time: Sat, 09 Mar 2013 22:55:04 +0000  Selecting site:221
> >  Stage in:35  Submitting:12  Submitted:1  Finished successfully:48
> > Progress:  time: Sat, 09 Mar 2013 22:55:24 +0000  Selecting site:221
> >  Stage in:48  Finished successfully:48
> > Progress:  time: Sat, 09 Mar 2013 22:55:54 +0000  Selecting site:221
> >  Stage in:48  Finished successfully:48
> > Progress:  time: Sat, 09 Mar 2013 22:56:08 +0000  Selecting site:221
> >  Stage in:47  Active:1  Finished successfully:48
> > Progress:  time: Sat, 09 Mar 2013 22:56:14 +0000  Selecting site:221
> >  Stage in:47  Stage out:1  Finished successfully:48
> > Progress:  time: Sat, 09 Mar 2013 22:56:16 +0000  Selecting site:221
> >  Stage in:47  Finished successfully:49
> > Progress:  time: Sat, 09 Mar 2013 22:56:19 +0000  Selecting site:220
> >  Stage in:47  Submitted:1  Finished successfully:49
> > Progress:  time: Sat, 09 Mar 2013 22:56:24 +0000  Selecting site:220
> >  Stage in:48  Finished successfully:49
> > Progress:  time: Sat, 09 Mar 2013 22:56:29 +0000  Selecting site:220
> >  Stage in:47  Active:1  Finished successfully:49
> > Progress:  time: Sat, 09 Mar 2013 22:56:35 +0000  Selecting site:220
> >  Stage in:47  Stage out:1  Finished successfully:49
> > Channels:
> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121]
> > -> BufferingChannel,
> > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121]
> > -> BufferingChannel,
> > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> > Context: service-60231
> > Meta context: service-60121
> > Progress:  time: Sat, 09 Mar 2013 22:56:37 +0000  Selecting site:220
> >  Stage in:47  Finished successfully:50
> > Progress:  time: Sat, 09 Mar 2013 22:56:40 +0000  Selecting site:219
> >  Stage in:47  Submitted:1  Finished successfully:50
> > Channels:
> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121]
> > -> BufferingChannel,
> > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121]
> > -> BufferingChannel,
> > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> > Context: service-60507
> > Meta context: service-60121
> > Progress:  time: Sat, 09 Mar 2013 22:56:44 +0000  Selecting site:219
> >  Stage in:47  Active:1  Finished successfully:50
> > Channels:
> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121]
> > -> BufferingChannel,
> > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121]
> > -> BufferingChannel,
> > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> > Context: service-60742
> > Meta context: service-60121
> > Progress:  time: Sat, 09 Mar 2013 22:56:48 +0000  Selecting site:219
> >  Stage in:46  Active:2  Finished successfully:50
> > Execution failed:
> > 	Exception in getlanduse:
> >     Arguments:
> >     [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb]
> >     Host: beagle
> >     Directory:
> >     modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l
> > 
> > Caused by:
> > 	Shutting down worker
> > 	getLandUse, modis02.swift, line 20
> > Attempted to unregister unregistered handler with id 526
> > Attempted to unregister unregistered handler with id 534
> > Attempted to unregister unregistered handler with id 430
> > Attempted to unregister unregistered handler with id 476
> > Failed to abort transfer
> > java.util.ConcurrentModificationException
> > 	at
> > 	java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> > 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> > 	at
> > 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> > 	at
> > 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> > Attempted to unregister unregistered handler with id 337
> > Failed to abort transfer
> > java.util.ConcurrentModificationException
> > 	at
> > 	java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> > 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> > 	at
> > 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> > 	at
> > 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> > 	at
> > 	org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267)
> > 	at
> > 	org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240)
> > 	at
> > 	org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228)
> > 	at
> > 	org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57)
> > 	at
> > 	org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257)
> > 	at
> > 	java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > 	at
> > 	java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > 	at
> > 	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > 	at
> > 	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > 	at java.lang.Thread.run(Thread.java:722)
> > Failed to abort transfer
> > java.util.ConcurrentModificationException
> > 	at
> > 	java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> > 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> > 	at
> > 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> > 	at
> > 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> > 	at
> > 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226
> > 
> > ----- Original Message -----
> > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > Sent: Saturday, March 9, 2013 4:24:17 PM
> > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from
> > > 	midway	to	beagle
> > > 
> > > I forgot to paste the error, sorry. Its below now, fer real.  When
> > > I
> > > dial down the throttle to 48 and only start 2 beagle nodes, I get
> > > further and the app calls make it to active state.  The 317 files
> > > being staged in here are 17MB each.
> > > 
> > > The swift progress output and error are below:
> > > 
> > > RunID: 20130309-2204-qu9ck076
> > > Progress:  time: Sat, 09 Mar 2013 22:04:34 +0000
> > > Progress:  time: Sat, 09 Mar 2013 22:04:45 +0000  Submitting:316
> > >  Submitted:1
> > > Progress:  time: Sat, 09 Mar 2013 22:04:51 +0000  Stage in:1
> > >  Submitted:316
> > > Progress:  time: Sat, 09 Mar 2013 22:04:52 +0000  Stage in:25
> > >  Submitted:292
> > > Progress:  time: Sat, 09 Mar 2013 22:04:53 +0000  Stage in:68
> > >  Submitted:249
> > > Progress:  time: Sat, 09 Mar 2013 22:04:55 +0000  Stage in:113
> > >  Submitted:204
> > > Progress:  time: Sat, 09 Mar 2013 22:04:56 +0000  Stage in:165
> > >  Submitted:152
> > > Progress:  time: Sat, 09 Mar 2013 22:04:58 +0000  Stage in:177
> > >  Submitted:140
> > > Progress:  time: Sat, 09 Mar 2013 22:05:00 +0000  Stage in:225
> > >  Submitted:92
> > > Progress:  time: Sat, 09 Mar 2013 22:05:04 +0000  Stage in:241
> > >  Submitted:76
> > > Progress:  time: Sat, 09 Mar 2013 22:05:05 +0000  Stage in:289
> > >  Submitted:28
> > > Progress:  time: Sat, 09 Mar 2013 22:05:09 +0000  Stage in:305
> > >  Submitted:12
> > > Progress:  time: Sat, 09 Mar 2013 22:05:34 +0000  Stage in:317
> > > Progress:  time: Sat, 09 Mar 2013 22:06:04 +0000  Stage in:317
> > > Progress:  time: Sat, 09 Mar 2013 22:06:34 +0000  Stage in:317
> > > Progress:  time: Sat, 09 Mar 2013 22:07:04 +0000  Stage in:317
> > > Progress:  time: Sat, 09 Mar 2013 22:07:34 +0000  Stage in:317
> > > Progress:  time: Sat, 09 Mar 2013 22:08:04 +0000  Stage in:317
> > > Progress:  time: Sat, 09 Mar 2013 22:08:34 +0000  Stage in:317
> > > Channels:
> > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > > ->
> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640]
> > > -> BufferingChannel,
> > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000]
> > > ->
> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640]
> > > -> BufferingChannel}
> > > Context: service-60822
> > > Meta context: service-60640
> > > Channels:
> > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > > ->
> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640]
> > > -> BufferingChannel,
> > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000]
> > > ->
> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640]
> > > -> BufferingChannel}
> > > Context: service-60116
> > > Meta context: service-60640
> > > Channels:
> > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > > ->
> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640]
> > > -> BufferingChannel,
> > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000]
> > > ->
> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640]
> > > -> BufferingChannel}
> > > Context: service-60598
> > > Meta context: service-60640
> > > Progress:  time: Sat, 09 Mar 2013 22:09:04 +0000  Stage in:317
> > > Progress:  time: Sat, 09 Mar 2013 22:09:08 +0000  Stage in:316
> > >  Active:1
> > > Execution failed:
> > > 	Exception in getlanduse:
> > >     Arguments:
> > >     [home/wilde/osgdemo/modis/svn/data/modis/2002/h15v02.rgb]
> > >     Host: beagle
> > >     Directory:
> > >     modis02-20130309-2204-qu9ck076/jobs/b/getlanduse-bmscjd6l
> > > 
> > > Caused by:
> > > 	Shutting down worker
> > > 	getLandUse, modis02.swift, line 20
> > > error null
> > > 
> > > real	4m36.777s
> > > user	2m55.240s
> > > sys	0m3.837s
> > > 
> > > 
> > > ---
> > > 
> > > With a throttle of 48 (.47) and 2 beagle nodes, I see:
> > > 
> > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
> > > 
> > > RunID: 20130309-2214-1oi3rvea
> > > Progress:  time: Sat, 09 Mar 2013 22:14:06 +0000
> > > Progress:  time: Sat, 09 Mar 2013 22:14:17 +0000  Selecting
> > > site:269
> > >  Submitting:47  Submitted:1
> > > Progress:  time: Sat, 09 Mar 2013 22:14:22 +0000  Selecting
> > > site:269
> > >  Stage in:1  Submitted:47
> > > Progress:  time: Sat, 09 Mar 2013 22:14:28 +0000  Selecting
> > > site:269
> > >  Stage in:25  Submitted:23
> > > Progress:  time: Sat, 09 Mar 2013 22:14:36 +0000  Selecting
> > > site:269
> > >  Stage in:48
> > > Progress:  time: Sat, 09 Mar 2013 22:15:06 +0000  Selecting
> > > site:269
> > >  Stage in:48
> > > Progress:  time: Sat, 09 Mar 2013 22:15:36 +0000  Selecting
> > > site:269
> > >  Stage in:48
> > > Progress:  time: Sat, 09 Mar 2013 22:16:06 +0000  Selecting
> > > site:269
> > >  Stage in:48
> > > Progress:  time: Sat, 09 Mar 2013 22:16:26 +0000  Selecting
> > > site:269
> > >  Stage in:47  Active:1
> > > Progress:  time: Sat, 09 Mar 2013 22:16:27 +0000  Selecting
> > > site:269
> > >  Stage in:36  Active:12
> > > Progress:  time: Sat, 09 Mar 2013 22:16:29 +0000  Selecting
> > > site:269
> > >  Stage in:24  Active:24
> > > Progress:  time: Sat, 09 Mar 2013 22:16:34 +0000  Selecting
> > > site:269
> > >  Stage in:24  Active:23  Stage out:1
> > > Progress:  time: Sat, 09 Mar 2013 22:16:35 +0000  Selecting
> > > site:269
> > >  Stage in:14  Active:33  Stage out:1
> > > Execution failed:
> > > 	Exception in getlanduse:
> > >     Arguments:
> > >     [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb]
> > >     Host: beagle
> > >     Directory:
> > >     modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l
> > > 
> > > Caused by:
> > > 	Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed
> > > 	with an exit code of 1
> > > 	getLandUse, modis02.swift, line 20
> > > 
> > > real	2m31.463s
> > > user	1m33.238s
> > > sys	0m2.160s
> > > + mv /home/wilde/.swift/runs/current/run024.1362867244
> > > /home/wilde/.swift/runs/completed
> > > 
> > > This error is likely in the demo app code; just pasting here to
> > > show
> > > that with less concurrency it makes progress.
> > > 
> > > ----- Original Message -----
> > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > > Sent: Saturday, March 9, 2013 4:11:24 PM
> > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from
> > > > midway	to	beagle
> > > > 
> > > > Now Im getting the error below (from running 317 simple MODIS
> > > > apps
> > > > concurrently).  Im going to dial down the throttle first to see
> > > > if
> > > > the staging load is overwhelming either coasters or the
> > > > midway-beagle path.
> > > > 
> > > > - Mike
> > > > 
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > > > Sent: Saturday, March 9, 2013 3:59:22 PM
> > > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from
> > > > > midway to	beagle
> > > > > 
> > > > > I think we just got this working. Problems may have included
> > > > > the
> > > > > need
> > > > > to pre-create the workdirectory and to specify a dotted IP
> > > > > address
> > > > > on the external network for GLOBUS_HOSTNAME.  Will need to
> > > > > experiment.  So likely that proxy expiration time was not a
> > > > > problem
> > > > > (although its confusing).
> > > > > 
> > > > > Will report back on this once the needed steps are clear.
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > - Mike
> > > > > 
> > > > > ----- Original Message -----
> > > > > > From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > > > > Sent: Saturday, March 9, 2013 3:56:36 PM
> > > > > > Subject: Re: Cant get auto-coasters to run from midway to
> > > > > > beagle
> > > > > > 
> > > > > > Can you post ,globus/coasters/coaster.log from beagle?
> > > > > > 
> > > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote:
> > > > > > > Mihael, can you advise on this problem?
> > > > > > > 
> > > > > > > David and I are trying to run automatic coaster jobs from
> > > > > > > midway
> > > > > > > login hosts and swift.rcc to beagle using ssh-cl:pbs.
> > > > > > > 
> > > > > > > My failed attempts are on midway under
> > > > > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has
> > > > > > > complete
> > > > > > > logs).
> > > > > > > 
> > > > > > > Quick question about the proxy files that get copied. Does
> > > > > > > this
> > > > > > > look OK? :
> > > > > > > 
> > > > > > >   2013-03-09 21:24:46,895+0000 INFO  AutoCA Checking
> > > > > > >   certificate
> > > > > > >   /home/wilde/.globus/coasters/proxy.0.pem
> > > > > > > 2013-03-09 21:24:46,967+0000 INFO  AutoCA Using certificate
> > > > > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration
> > > > > > > date
> > > > > > > Sat
> > > > > > > Mar 23\
> > > > > > >  19:25:53 GMT 2013
> > > > > > > 
> > > > > > > The proxy expiration time listed above is two hours
> > > > > > > *earlier*
> > > > > > > than
> > > > > > > the current time (as seen in the message's UTC timestamp).
> > > > > > >  Is
> > > > > > > that correct, or a possible cause of this problem?
> > > > > > > 
> > > > > > > The main symptom seems to be this:
> > > > > > > 
> > > > > > > Execution failed:
> > > > > > > 	Exception in getlanduse:
> > > > > > >     Arguments: [../data/modis/2002/h00v09.rgb]
> > > > > > >     Host: beagle
> > > > > > >     Directory:
> > > > > > >     modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l
> > > > > > > 
> > > > > > > Caused by:
> > > > > > > 	Could not submit job
> > > > > > > Caused by:
> > > > > > > 	Could not start coaster service
> > > > > > > Caused by:
> > > > > > > 	Task ended before registration was received.
> > > > > > > Failed to download bootstrap jar from
> > > > > > > http://midway001.rcc.uchicago.edu:50001
> > > > > > > ---
> > > > > > > 
> > > > > > > Yet Ive verified that midway login4 (which is the target
> > > > > > > system)
> > > > > > > can connect to this hostname and port (with nc -l and
> > > > > > > telnet)
> > > > > > > 
> > > > > > > - Mike
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > > > 
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > > 
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > 





More information about the Swift-devel mailing list