[Swift-user] Debugging Swift Coaster ServiceManager
TJ Lane
tjlane at stanford.edu
Sat Jun 15 17:15:04 CDT 2013
Swift Users,
Finally back to trying out swift after a delay -- thanks for all your help
so far.
I've got a functional swift script up and running, and am now trying to
configure my sites.xml to get it running on 4 remote clusters. I've gotten
it working on 2, so 2 more to go!
Let's focus on one first. This cluster is running PBS and I'm trying to
access it using coasters, via provider="ssh-cl:pbs". Unfortunately, it
seems like swift can't boot up the coaster service for some reason, which I
haven't been able to figure out. Maybe someone can help me debug this, or
at least know where to start poking around!
Here's the site xml entry:
<pool handle="biox3">
<execution provider="coaster" jobmanager="ssh-cl:pbs" url="
biox3.stanford.edu"/>
<profile namespace="globus" key="maxWalltime">00:30:00</profile>
<profile namespace="globus" key="lowOverAllocation">100</profile>
<profile namespace="globus" key="highOverAllocation">100</profile>
<profile namespace="globus" key="maxtime">3600</profile>
<profile namespace="globus" key="queue">batch</profile>
<profile namespace="globus" key="slots">10</profile>
<profile namespace="globus" key="maxnodes">1</profile>
<profile namespace="globus" key="nodeGranularity">1</profile>
<profile namespace="globus" key="jobsPerNode">1</profile>
<profile namespace="karajan" key="jobThrottle">1.0</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<!--
<profile namespace="env" key="SWIFT_GEN_SCRIPTS">1</profile>
-->
<workdirectory>/home/tjlane/swiftwork</workdirectory>
</pool>
and here's what gets printed when I try and run a very basic "hello
cluster" swift script:
tjlane at vspm42 ~/swift_hello
$ swift -sites.file ~/opt/swift-0.94/etc/sites.xml -tc.file
~/opt/swift-0.94/etc/tc.data -config swift.properties uname.swift
Swift started
Swift 0.94 swift-r6492 cog-r3658
RunID: 20130615-1512-h2fskgme
Progress: time: Sat, 15 Jun 2013 15:12:32 -0700
Progress: time: Sat, 15 Jun 2013 15:12:34 -0700 Submitted:1
Execution failed:
Exception in uname:
Arguments: [-a]
Host: biox3
Directory: uname-20130615-1512-h2fskgme/jobs/a/uname-aan4rzal
Caused by:
Could not submit job
Caused by:
Could not start coaster service
Caused by:
Task ended before registration was received.
Failed to start coaster service
java.lang.NullPointerException
at java.net.URI.compareTo(libgcj.so.10)
at java.net.URI.compareTo(libgcj.so.10)
at java.util.TreeMap.compare(libgcj.so.10)
at java.util.TreeMap.put(libgcj.so.10)
at java.util.TreeSet.addAll(libgcj.so.10)
at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
java.lang.NullPointerException
at java.net.URI.compareTo(libgcj.so.10)
at java.net.URI.compareTo(libgcj.so.10)
at java.util.TreeMap.compare(libgcj.so.10)
at java.util.TreeMap.put(libgcj.so.10)
at java.util.TreeSet.addAll(libgcj.so.10)
at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
uname, uname.swift, line 12
Finally, here's part of what gets dumped to my log file:
<snip>
2013-06-15 14:54:22,350-0700 INFO BootstrapService [/171.67.106.68:39309]
GET /coaster-bootstrap.jar HTTP/1.0
2013-06-15 14:54:22,713-0700 INFO ServiceManager Service task
Task(type=JOB_SUBMISSION, identity=urn:cog-1371333260175) terminated.
Removing service.
2013-06-15 14:54:22,713-0700 INFO ServiceManager Service does not appear
to be registered with this manager
2013-06-15 14:54:22,713-0700 INFO ServiceManager Coaster service ended.
Reason: null
stdout:
stderr: Failed to start coaster service
java.lang.NullPointerException
at java.net.URI.compareTo(libgcj.so.10)
at java.net.URI.compareTo(libgcj.so.10)
at java.util.TreeMap.compare(libgcj.so.10)
at java.util.TreeMap.put(libgcj.so.10)
at java.util.TreeSet.addAll(libgcj.so.10)
at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
java.lang.NullPointerException
at java.net.URI.compareTo(libgcj.so.10)
at java.net.URI.compareTo(libgcj.so.10)
at java.util.TreeMap.compare(libgcj.so.10)
at java.util.TreeMap.put(libgcj.so.10)
at java.util.TreeSet.addAll(libgcj.so.10)
at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
2013-06-15 14:54:22,714-0700 INFO NotificationManager biox3.stanford.edu
2013-06-15 14:54:22,771-0700 INFO RuntimeStats$ProgressTicker Submitted:1
2013-06-15 14:54:22,775-0700 DEBUG swift APPLICATION_EXCEPTION
jobid=uname-d77eqzal - Application exception: Caused by: Could not submit
job
Caused by: Could not start coaster service
Caused by: Task ended before registration was received.
Failed to start coaster service
java.lang.NullPointerException
at java.net.URI.compareTo(libgcj.so.10)
at java.net.URI.compareTo(libgcj.so.10)
at java.util.TreeMap.compare(libgcj.so.10)
at java.util.TreeMap.put(libgcj.so.10)
at java.util.TreeSet.addAll(libgcj.so.10)
at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
java.lang.NullPointerException
at java.net.URI.compareTo(libgcj.so.10)
at java.net.URI.compareTo(libgcj.so.10)
at java.util.TreeMap.compare(libgcj.so.10)
at java.util.TreeMap.put(libgcj.so.10)
at java.util.TreeSet.addAll(libgcj.so.10)
at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
<snip>
Any help or advice on how to resolve this issue, much much appreciated!
Thanks,
TJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20130615/5dfe364f/attachment.html>
More information about the Swift-user
mailing list