[Swift-devel] coaster-service error on Intrepid

Michael Wilde wilde at mcs.anl.gov
Wed Dec 1 11:15:52 CST 2010


Justin, I was experimenting on PADS with the persistent coaster service; thats where I tested Mihael's fix, which enabled the service to be used repeatedly and to remain up for extended periods of time.

I just started yesterday trying to move that to the BG/P - I think for the same reason as you.

My script is in /home/wilde/swift/lab/pecos/start-coasters on Surveyor.

I'll stop by to see if we can get this working, as it will help us both on the CDM runs.

One thing to note: I run one artificial job to put the service into passive mode, which seems necessary to enable externally started workers to connect to it. Ideally we'll soon just make this a command line flag to the service.

- Mike


----- Original Message -----
> Hello all
> I'm getting started with the coaster-service on Intrepid. I start
> up the service and the first run completes. The second fails with the
> trace below. sites.xml is also included below. I'm looking into this
> but
> I thought I should post it...
> Justin
> 
> Intrepid: ~> coaster-service -p 2390 -nosec
> Started coaster service: http://140.221.82.115:2390
> original callback URI is http://10.40.5.144:32907
> callback URI has been overridden to http://172.17.5.144:32907
> Failed to send remote log message
> org.globus.cog.karajan.workflow.service.channels.ChannelException:
> Channel
> died and no contact available
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.connect(ChannelManager.java:235)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:257)
> at
> org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227)
> at
> org.globus.cog.abstraction.coaster.rlog.RemoteLogger.log(RemoteLogger.java:31)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.Block.start(Block.java:87)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.addBlock(BlockQueueProcessor.java:213)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.allocateBlocks(BlockQueueProcessor.java:395)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:518)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:100)
> 
> <pool handle="coasters_alcfbgp">
> <filesystem provider="local" />
> <execution provider="coaster-persistent"
> jobmanager="local:cobalt"
> url="http://140.221.82.115:2390"
> />
> <!-- <profile namespace="swift" key="stagingMethod">local</profile>
> -->
> <profile namespace="globus"
> key="internalHostname">172.17.5.144</profile>
> <profile namespace="globus" key="project">HTCScienceApps</profile>
> <profile namespace="globus" key="queue">prod-devel</profile>
> <profile namespace="globus" key="kernelprofile">zeptoos</profile>
> <profile namespace="globus" key="alcfbgpnat">true</profile>
> <profile namespace="karajan" key="jobthrottle">21</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> <profile namespace="globus" key="workersPerNode">1</profile>
> <profile namespace="globus" key="slots">1</profile>
> <profile namespace="globus" key="maxTime">3300</profile>
> <profile namespace="globus" key="nodeGranularity">64</profile>
> <profile namespace="globus" key="maxNodes">64</profile>
> <profile namespace="globus"
> key="hookClass">org.globus.swift.data.policy.AllocationHook
> </profile>
> <!-- <scratch>/scratch</scratch> -->
> <workdirectory>/home/wozniak/work</workdirectory>
> </pool>
> 
> 
> --
> Justin M Wozniak
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list