[Swift-devel] a note on stress testing

Michael Wilde wilde at mcs.anl.gov
Sat Mar 9 17:12:58 CST 2013


Yadu, below is exactly the kind of error Im hoping we can catch in the test suite.

The one below is happening on remote submissions from midway to beagle using coaster provider staging of 17MB input files.

So it might need both site-config and stress testing concurrently, to detect.

- Mike


----- Forwarded Message -----
From: "Michael Wilde" <wilde at mcs.anl.gov>
To: "Mihael Hategan" <hategan at mcs.anl.gov>
Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
Sent: Saturday, March 9, 2013 5:09:16 PM
Subject: Re: [Swift-devel] Cant get auto-coasters to	run	from	midway	to	beagle

See instead run028.  Errors below.

Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)

RunID: 20130309-2252-x37dmuy0
Progress:  time: Sat, 09 Mar 2013 22:52:24 +0000
Progress:  time: Sat, 09 Mar 2013 22:52:34 +0000  Selecting site:269  Submitting:47  Submitted:1
Progress:  time: Sat, 09 Mar 2013 22:52:42 +0000  Selecting site:269  Stage in:1  Submitted:47
Progress:  time: Sat, 09 Mar 2013 22:52:54 +0000  Selecting site:269  Stage in:48
Progress:  time: Sat, 09 Mar 2013 22:53:24 +0000  Selecting site:269  Stage in:48
Progress:  time: Sat, 09 Mar 2013 22:53:54 +0000  Selecting site:269  Stage in:48
Progress:  time: Sat, 09 Mar 2013 22:54:24 +0000  Selecting site:269  Stage in:48
Progress:  time: Sat, 09 Mar 2013 22:54:51 +0000  Selecting site:269  Stage in:47  Active:1
Progress:  time: Sat, 09 Mar 2013 22:54:52 +0000  Selecting site:269  Stage in:42  Active:6
Progress:  time: Sat, 09 Mar 2013 22:54:54 +0000  Selecting site:269  Stage in:24  Active:24
Progress:  time: Sat, 09 Mar 2013 22:54:57 +0000  Selecting site:269  Active:47  Stage out:1
Progress:  time: Sat, 09 Mar 2013 22:54:58 +0000  Selecting site:266  Stage in:2  Submitted:1  Active:44  Stage out:1  Finished successfully:3
Progress:  time: Sat, 09 Mar 2013 22:55:00 +0000  Selecting site:261  Stage in:5  Submitting:2  Submitted:1  Active:37  Stage out:3  Finished successfully:8
Progress:  time: Sat, 09 Mar 2013 22:55:01 +0000  Selecting site:254  Stage in:12  Submitting:3  Active:24  Stage out:8  Finished successfully:16
Progress:  time: Sat, 09 Mar 2013 22:55:02 +0000  Selecting site:241  Stage in:23  Submitting:5  Active:15  Stage out:4  Finished successfully:29
Progress:  time: Sat, 09 Mar 2013 22:55:03 +0000  Selecting site:234  Stage in:28  Submitting:7  Stage out:12  Finished successfully:36
Progress:  time: Sat, 09 Mar 2013 22:55:04 +0000  Selecting site:221  Stage in:35  Submitting:12  Submitted:1  Finished successfully:48
Progress:  time: Sat, 09 Mar 2013 22:55:24 +0000  Selecting site:221  Stage in:48  Finished successfully:48
Progress:  time: Sat, 09 Mar 2013 22:55:54 +0000  Selecting site:221  Stage in:48  Finished successfully:48
Progress:  time: Sat, 09 Mar 2013 22:56:08 +0000  Selecting site:221  Stage in:47  Active:1  Finished successfully:48
Progress:  time: Sat, 09 Mar 2013 22:56:14 +0000  Selecting site:221  Stage in:47  Stage out:1  Finished successfully:48
Progress:  time: Sat, 09 Mar 2013 22:56:16 +0000  Selecting site:221  Stage in:47  Finished successfully:49
Progress:  time: Sat, 09 Mar 2013 22:56:19 +0000  Selecting site:220  Stage in:47  Submitted:1  Finished successfully:49
Progress:  time: Sat, 09 Mar 2013 22:56:24 +0000  Selecting site:220  Stage in:48  Finished successfully:49
Progress:  time: Sat, 09 Mar 2013 22:56:29 +0000  Selecting site:220  Stage in:47  Active:1  Finished successfully:49
Progress:  time: Sat, 09 Mar 2013 22:56:35 +0000  Selecting site:220  Stage in:47  Stage out:1  Finished successfully:49
Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
Context: service-60231
Meta context: service-60121
Progress:  time: Sat, 09 Mar 2013 22:56:37 +0000  Selecting site:220  Stage in:47  Finished successfully:50
Progress:  time: Sat, 09 Mar 2013 22:56:40 +0000  Selecting site:219  Stage in:47  Submitted:1  Finished successfully:50
Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
Context: service-60507
Meta context: service-60121
Progress:  time: Sat, 09 Mar 2013 22:56:44 +0000  Selecting site:219  Stage in:47  Active:1  Finished successfully:50
Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
Context: service-60742
Meta context: service-60121
Progress:  time: Sat, 09 Mar 2013 22:56:48 +0000  Selecting site:219  Stage in:46  Active:2  Finished successfully:50
Execution failed:
	Exception in getlanduse:
    Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb]
    Host: beagle
    Directory: modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l

Caused by:
	Shutting down worker
	getLandUse, modis02.swift, line 20
Attempted to unregister unregistered handler with id 526
Attempted to unregister unregistered handler with id 534
Attempted to unregister unregistered handler with id 430
Attempted to unregister unregistered handler with id 476
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 337
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267)
	at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240)
	at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228)
	at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57)
	at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 348
Attempted to unregister unregistered handler with id 466
Attempted to unregister unregistered handler with id 347
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 456
Attempted to unregister unregistered handler with id 454
Attempted to unregister unregistered handler with id 508
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 511
Attempted to unregister unregistered handler with id 506
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 380
Attempted to unregister unregistered handler with id 502
Attempted to unregister unregistered handler with id 376
Attempted to unregister unregistered handler with id 226
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Failed to abort transfer
java.util.ConcurrentModificationException
	at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
	at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
	at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
	at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Attempted to unregister unregistered handler with id 484
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-66-1-1-1362869544093)
Task being removed twice?
java.lang.Throwable
	at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.removeTask(AbstractGridNode.java:291)
	at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:263)
	at org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:136)
	at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168)
	at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:665)
	at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:428)
	at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:426)
	at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240)
	at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228)
	at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.errorReceived(JobSubmissionTaskHandler.java:219)
	at org.globus.cog.karajan.workflow.service.commands.Command.errorReceived(Command.java:191)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:227)
	at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
	at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
	at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
	at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-244-1-1-1362869544091)
Ex098
java.lang.NullPointerException
	at org.globus.cog.karajan.arguments.NamedArgumentsImpl.merge(NamedArgumentsImpl.java:52)
	at org.globus.cog.karajan.workflow.nodes.SequentialChoice.commitBuffers(SequentialChoice.java:46)
	at org.globus.cog.karajan.workflow.nodes.SequentialChoice.completed(SequentialChoice.java:40)
	at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
	at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
	at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
	at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
	at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
	at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
	at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
	at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29)
	at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20)
	at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
	at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139)
	at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
	at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98)
	at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Could not fail element
Attempted to close nonexistent channel buffers

	at org.globus.cog.karajan.arguments.ArgUtil.closeBuffers(ArgUtil.java:279)
	at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.closeBuffers(AbstractParallelIterator.java:107)
	at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.failed(AbstractParallelIterator.java:143)
	at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:89)
	at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:151)
	at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
	at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98)
	at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-84-1-1-1362869544098)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-64-1-1-1362869544095)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-45-1-1-1362869544101)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-79-1-1-1362869544099)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-68-1-1-1362869544108)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-63-1-1-1362869544103)
Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-256-1-1-1362869544077)
error null
error null

real	4m27.856s
user	2m45.576s
sys	0m3.697s
+ mv /home/wilde/.swift/runs/current/run028.1362869541 /home/wilde/.swift/runs/completed
midway001$ 


----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Mihael Hategan" <hategan at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Saturday, March 9, 2013 5:05:25 PM
> Subject: Re: [Swift-devel] Cant get auto-coasters to run	from	midway	to	beagle
> 
> Mihael, now I think I have a coaster problem. Curiously, it always
> seems to happen at about 5 mins into the run.
> 
> Logs for these runs are on midway in eg
> /home/wilde/osgdemo/modis/svn/run027
> 
> leading portion of error from stdout/err is below.
> 
> - Mike
> 
> Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
> 
> RunID: 20130309-2252-x37dmuy0
> Progress:  time: Sat, 09 Mar 2013 22:52:24 +0000
> Progress:  time: Sat, 09 Mar 2013 22:52:34 +0000  Selecting site:269
>  Submitting:47  Submitted:1
> Progress:  time: Sat, 09 Mar 2013 22:52:42 +0000  Selecting site:269
>  Stage in:1  Submitted:47
> Progress:  time: Sat, 09 Mar 2013 22:52:54 +0000  Selecting site:269
>  Stage in:48
> Progress:  time: Sat, 09 Mar 2013 22:53:24 +0000  Selecting site:269
>  Stage in:48
> Progress:  time: Sat, 09 Mar 2013 22:53:54 +0000  Selecting site:269
>  Stage in:48
> Progress:  time: Sat, 09 Mar 2013 22:54:24 +0000  Selecting site:269
>  Stage in:48
> Progress:  time: Sat, 09 Mar 2013 22:54:51 +0000  Selecting site:269
>  Stage in:47  Active:1
> Progress:  time: Sat, 09 Mar 2013 22:54:52 +0000  Selecting site:269
>  Stage in:42  Active:6
> Progress:  time: Sat, 09 Mar 2013 22:54:54 +0000  Selecting site:269
>  Stage in:24  Active:24
> Progress:  time: Sat, 09 Mar 2013 22:54:57 +0000  Selecting site:269
>  Active:47  Stage out:1
> Progress:  time: Sat, 09 Mar 2013 22:54:58 +0000  Selecting site:266
>  Stage in:2  Submitted:1  Active:44  Stage out:1  Finished
> successfully:3
> Progress:  time: Sat, 09 Mar 2013 22:55:00 +0000  Selecting site:261
>  Stage in:5  Submitting:2  Submitted:1  Active:37  Stage out:3
>  Finished successfully:8
> Progress:  time: Sat, 09 Mar 2013 22:55:01 +0000  Selecting site:254
>  Stage in:12  Submitting:3  Active:24  Stage out:8  Finished
> successfully:16
> Progress:  time: Sat, 09 Mar 2013 22:55:02 +0000  Selecting site:241
>  Stage in:23  Submitting:5  Active:15  Stage out:4  Finished
> successfully:29
> Progress:  time: Sat, 09 Mar 2013 22:55:03 +0000  Selecting site:234
>  Stage in:28  Submitting:7  Stage out:12  Finished successfully:36
> Progress:  time: Sat, 09 Mar 2013 22:55:04 +0000  Selecting site:221
>  Stage in:35  Submitting:12  Submitted:1  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:55:24 +0000  Selecting site:221
>  Stage in:48  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:55:54 +0000  Selecting site:221
>  Stage in:48  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:56:08 +0000  Selecting site:221
>  Stage in:47  Active:1  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:56:14 +0000  Selecting site:221
>  Stage in:47  Stage out:1  Finished successfully:48
> Progress:  time: Sat, 09 Mar 2013 22:56:16 +0000  Selecting site:221
>  Stage in:47  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 22:56:19 +0000  Selecting site:220
>  Stage in:47  Submitted:1  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 22:56:24 +0000  Selecting site:220
>  Stage in:48  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 22:56:29 +0000  Selecting site:220
>  Stage in:47  Active:1  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 22:56:35 +0000  Selecting site:220
>  Stage in:47  Stage out:1  Finished successfully:49
> Channels:
> {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> Context: service-60231
> Meta context: service-60121
> Progress:  time: Sat, 09 Mar 2013 22:56:37 +0000  Selecting site:220
>  Stage in:47  Finished successfully:50
> Progress:  time: Sat, 09 Mar 2013 22:56:40 +0000  Selecting site:219
>  Stage in:47  Submitted:1  Finished successfully:50
> Channels:
> {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> Context: service-60507
> Meta context: service-60121
> Progress:  time: Sat, 09 Mar 2013 22:56:44 +0000  Selecting site:219
>  Stage in:47  Active:1  Finished successfully:50
> Channels:
> {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121]
> -> BufferingChannel,
> null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000]
> ->
> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]}
> Context: service-60742
> Meta context: service-60121
> Progress:  time: Sat, 09 Mar 2013 22:56:48 +0000  Selecting site:219
>  Stage in:46  Active:2  Finished successfully:50
> Execution failed:
> 	Exception in getlanduse:
>     Arguments:
>     [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb]
>     Host: beagle
>     Directory:
>     modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l
> 
> Caused by:
> 	Shutting down worker
> 	getLandUse, modis02.swift, line 20
> Attempted to unregister unregistered handler with id 526
> Attempted to unregister unregistered handler with id 534
> Attempted to unregister unregistered handler with id 430
> Attempted to unregister unregistered handler with id 476
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at
> 	java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at
> 	org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at
> 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at
> 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at
> 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at
> 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74)
> Attempted to unregister unregistered handler with id 337
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at
> 	java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at
> 	org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at
> 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at
> 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at
> 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at
> 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226)
> 	at
> 	org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267)
> 	at
> 	org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240)
> 	at
> 	org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228)
> 	at
> 	org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57)
> 	at
> 	org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257)
> 	at
> 	java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at
> 	java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at
> 	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at
> 	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> Failed to abort transfer
> java.util.ConcurrentModificationException
> 	at
> 	java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
> 	at java.util.LinkedList$ListItr.next(LinkedList.java:886)
> 	at
> 	org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72)
> 	at
> 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139)
> 	at
> 	org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111)
> 	at
> 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195)
> 	at
> 	org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239)
> 	at
> 	org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226
> 
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Saturday, March 9, 2013 4:24:17 PM
> > Subject: Re: [Swift-devel] Cant get auto-coasters to run from
> > 	midway	to	beagle
> > 
> > I forgot to paste the error, sorry. Its below now, fer real.  When
> > I
> > dial down the throttle to 48 and only start 2 beagle nodes, I get
> > further and the app calls make it to active state.  The 317 files
> > being staged in here are 17MB each.
> > 
> > The swift progress output and error are below:
> > 
> > RunID: 20130309-2204-qu9ck076
> > Progress:  time: Sat, 09 Mar 2013 22:04:34 +0000
> > Progress:  time: Sat, 09 Mar 2013 22:04:45 +0000  Submitting:316
> >  Submitted:1
> > Progress:  time: Sat, 09 Mar 2013 22:04:51 +0000  Stage in:1
> >  Submitted:316
> > Progress:  time: Sat, 09 Mar 2013 22:04:52 +0000  Stage in:25
> >  Submitted:292
> > Progress:  time: Sat, 09 Mar 2013 22:04:53 +0000  Stage in:68
> >  Submitted:249
> > Progress:  time: Sat, 09 Mar 2013 22:04:55 +0000  Stage in:113
> >  Submitted:204
> > Progress:  time: Sat, 09 Mar 2013 22:04:56 +0000  Stage in:165
> >  Submitted:152
> > Progress:  time: Sat, 09 Mar 2013 22:04:58 +0000  Stage in:177
> >  Submitted:140
> > Progress:  time: Sat, 09 Mar 2013 22:05:00 +0000  Stage in:225
> >  Submitted:92
> > Progress:  time: Sat, 09 Mar 2013 22:05:04 +0000  Stage in:241
> >  Submitted:76
> > Progress:  time: Sat, 09 Mar 2013 22:05:05 +0000  Stage in:289
> >  Submitted:28
> > Progress:  time: Sat, 09 Mar 2013 22:05:09 +0000  Stage in:305
> >  Submitted:12
> > Progress:  time: Sat, 09 Mar 2013 22:05:34 +0000  Stage in:317
> > Progress:  time: Sat, 09 Mar 2013 22:06:04 +0000  Stage in:317
> > Progress:  time: Sat, 09 Mar 2013 22:06:34 +0000  Stage in:317
> > Progress:  time: Sat, 09 Mar 2013 22:07:04 +0000  Stage in:317
> > Progress:  time: Sat, 09 Mar 2013 22:07:34 +0000  Stage in:317
> > Progress:  time: Sat, 09 Mar 2013 22:08:04 +0000  Stage in:317
> > Progress:  time: Sat, 09 Mar 2013 22:08:34 +0000  Stage in:317
> > Channels:
> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640]
> > -> BufferingChannel,
> > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640]
> > -> BufferingChannel}
> > Context: service-60822
> > Meta context: service-60640
> > Channels:
> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640]
> > -> BufferingChannel,
> > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640]
> > -> BufferingChannel}
> > Context: service-60116
> > Meta context: service-60640
> > Channels:
> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640]
> > -> BufferingChannel,
> > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000]
> > ->
> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000],
> > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640]
> > -> BufferingChannel}
> > Context: service-60598
> > Meta context: service-60640
> > Progress:  time: Sat, 09 Mar 2013 22:09:04 +0000  Stage in:317
> > Progress:  time: Sat, 09 Mar 2013 22:09:08 +0000  Stage in:316
> >  Active:1
> > Execution failed:
> > 	Exception in getlanduse:
> >     Arguments:
> >     [home/wilde/osgdemo/modis/svn/data/modis/2002/h15v02.rgb]
> >     Host: beagle
> >     Directory:
> >     modis02-20130309-2204-qu9ck076/jobs/b/getlanduse-bmscjd6l
> > 
> > Caused by:
> > 	Shutting down worker
> > 	getLandUse, modis02.swift, line 20
> > error null
> > 
> > real	4m36.777s
> > user	2m55.240s
> > sys	0m3.837s
> > 
> > 
> > ---
> > 
> > With a throttle of 48 (.47) and 2 beagle nodes, I see:
> > 
> > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
> > 
> > RunID: 20130309-2214-1oi3rvea
> > Progress:  time: Sat, 09 Mar 2013 22:14:06 +0000
> > Progress:  time: Sat, 09 Mar 2013 22:14:17 +0000  Selecting
> > site:269
> >  Submitting:47  Submitted:1
> > Progress:  time: Sat, 09 Mar 2013 22:14:22 +0000  Selecting
> > site:269
> >  Stage in:1  Submitted:47
> > Progress:  time: Sat, 09 Mar 2013 22:14:28 +0000  Selecting
> > site:269
> >  Stage in:25  Submitted:23
> > Progress:  time: Sat, 09 Mar 2013 22:14:36 +0000  Selecting
> > site:269
> >  Stage in:48
> > Progress:  time: Sat, 09 Mar 2013 22:15:06 +0000  Selecting
> > site:269
> >  Stage in:48
> > Progress:  time: Sat, 09 Mar 2013 22:15:36 +0000  Selecting
> > site:269
> >  Stage in:48
> > Progress:  time: Sat, 09 Mar 2013 22:16:06 +0000  Selecting
> > site:269
> >  Stage in:48
> > Progress:  time: Sat, 09 Mar 2013 22:16:26 +0000  Selecting
> > site:269
> >  Stage in:47  Active:1
> > Progress:  time: Sat, 09 Mar 2013 22:16:27 +0000  Selecting
> > site:269
> >  Stage in:36  Active:12
> > Progress:  time: Sat, 09 Mar 2013 22:16:29 +0000  Selecting
> > site:269
> >  Stage in:24  Active:24
> > Progress:  time: Sat, 09 Mar 2013 22:16:34 +0000  Selecting
> > site:269
> >  Stage in:24  Active:23  Stage out:1
> > Progress:  time: Sat, 09 Mar 2013 22:16:35 +0000  Selecting
> > site:269
> >  Stage in:14  Active:33  Stage out:1
> > Execution failed:
> > 	Exception in getlanduse:
> >     Arguments:
> >     [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb]
> >     Host: beagle
> >     Directory:
> >     modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l
> > 
> > Caused by:
> > 	Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed
> > 	with an exit code of 1
> > 	getLandUse, modis02.swift, line 20
> > 
> > real	2m31.463s
> > user	1m33.238s
> > sys	0m2.160s
> > + mv /home/wilde/.swift/runs/current/run024.1362867244
> > /home/wilde/.swift/runs/completed
> > 
> > This error is likely in the demo app code; just pasting here to
> > show
> > that with less concurrency it makes progress.
> > 
> > ----- Original Message -----
> > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > Sent: Saturday, March 9, 2013 4:11:24 PM
> > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from
> > > midway	to	beagle
> > > 
> > > Now Im getting the error below (from running 317 simple MODIS
> > > apps
> > > concurrently).  Im going to dial down the throttle first to see
> > > if
> > > the staging load is overwhelming either coasters or the
> > > midway-beagle path.
> > > 
> > > - Mike
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > > Sent: Saturday, March 9, 2013 3:59:22 PM
> > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from
> > > > midway to	beagle
> > > > 
> > > > I think we just got this working. Problems may have included
> > > > the
> > > > need
> > > > to pre-create the workdirectory and to specify a dotted IP
> > > > address
> > > > on the external network for GLOBUS_HOSTNAME.  Will need to
> > > > experiment.  So likely that proxy expiration time was not a
> > > > problem
> > > > (although its confusing).
> > > > 
> > > > Will report back on this once the needed steps are clear.
> > > > 
> > > > Thanks,
> > > > 
> > > > - Mike
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > > > Sent: Saturday, March 9, 2013 3:56:36 PM
> > > > > Subject: Re: Cant get auto-coasters to run from midway to
> > > > > beagle
> > > > > 
> > > > > Can you post ,globus/coasters/coaster.log from beagle?
> > > > > 
> > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote:
> > > > > > Mihael, can you advise on this problem?
> > > > > > 
> > > > > > David and I are trying to run automatic coaster jobs from
> > > > > > midway
> > > > > > login hosts and swift.rcc to beagle using ssh-cl:pbs.
> > > > > > 
> > > > > > My failed attempts are on midway under
> > > > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has
> > > > > > complete
> > > > > > logs).
> > > > > > 
> > > > > > Quick question about the proxy files that get copied. Does
> > > > > > this
> > > > > > look OK? :
> > > > > > 
> > > > > >   2013-03-09 21:24:46,895+0000 INFO  AutoCA Checking
> > > > > >   certificate
> > > > > >   /home/wilde/.globus/coasters/proxy.0.pem
> > > > > > 2013-03-09 21:24:46,967+0000 INFO  AutoCA Using certificate
> > > > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration
> > > > > > date
> > > > > > Sat
> > > > > > Mar 23\
> > > > > >  19:25:53 GMT 2013
> > > > > > 
> > > > > > The proxy expiration time listed above is two hours
> > > > > > *earlier*
> > > > > > than
> > > > > > the current time (as seen in the message's UTC timestamp).
> > > > > >  Is
> > > > > > that correct, or a possible cause of this problem?
> > > > > > 
> > > > > > The main symptom seems to be this:
> > > > > > 
> > > > > > Execution failed:
> > > > > > 	Exception in getlanduse:
> > > > > >     Arguments: [../data/modis/2002/h00v09.rgb]
> > > > > >     Host: beagle
> > > > > >     Directory:
> > > > > >     modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l
> > > > > > 
> > > > > > Caused by:
> > > > > > 	Could not submit job
> > > > > > Caused by:
> > > > > > 	Could not start coaster service
> > > > > > Caused by:
> > > > > > 	Task ended before registration was received.
> > > > > > Failed to download bootstrap jar from
> > > > > > http://midway001.rcc.uchicago.edu:50001
> > > > > > ---
> > > > > > 
> > > > > > Yet Ive verified that midway login4 (which is the target
> > > > > > system)
> > > > > > can connect to this hostname and port (with nc -l and
> > > > > > telnet)
> > > > > > 
> > > > > > - Mike
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > > 
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel



More information about the Swift-devel mailing list