[Swift-user] Debugging Swift Coaster ServiceManager

TJ Lane tjlane at stanford.edu
Sat Jun 15 17:15:04 CDT 2013


Swift Users,

Finally back to trying out swift after a delay -- thanks for all your help
so far.

I've got a functional swift script up and running, and am now trying to
configure my sites.xml to get it running on 4 remote clusters. I've gotten
it working on 2, so 2 more to go!

Let's focus on one first. This cluster is running PBS and I'm trying to
access it using coasters, via provider="ssh-cl:pbs". Unfortunately, it
seems like swift can't boot up the coaster service for some reason, which I
haven't been able to figure out. Maybe someone can help me debug this, or
at least know where to start poking around!

Here's the site xml entry:

  <pool handle="biox3">

    <execution provider="coaster" jobmanager="ssh-cl:pbs" url="
biox3.stanford.edu"/>

    <profile namespace="globus" key="maxWalltime">00:30:00</profile>

    <profile namespace="globus" key="lowOverAllocation">100</profile>
    <profile namespace="globus" key="highOverAllocation">100</profile>
    <profile namespace="globus" key="maxtime">3600</profile>

    <profile namespace="globus" key="queue">batch</profile>
    <profile namespace="globus" key="slots">10</profile>
    <profile namespace="globus" key="maxnodes">1</profile>
    <profile namespace="globus" key="nodeGranularity">1</profile>

    <profile namespace="globus" key="jobsPerNode">1</profile>

    <profile namespace="karajan" key="jobThrottle">1.0</profile>
    <profile namespace="karajan" key="initialScore">10000</profile>

    <!--
    <profile namespace="env" key="SWIFT_GEN_SCRIPTS">1</profile>
    -->

    <workdirectory>/home/tjlane/swiftwork</workdirectory>

  </pool>

and here's what gets printed when I try and run a very basic "hello
cluster" swift script:

tjlane at vspm42 ~/swift_hello
$ swift -sites.file ~/opt/swift-0.94/etc/sites.xml -tc.file
~/opt/swift-0.94/etc/tc.data  -config swift.properties uname.swift
Swift started
Swift 0.94 swift-r6492 cog-r3658

RunID: 20130615-1512-h2fskgme
Progress:  time: Sat, 15 Jun 2013 15:12:32 -0700
Progress:  time: Sat, 15 Jun 2013 15:12:34 -0700  Submitted:1
Execution failed:
    Exception in uname:
    Arguments: [-a]
    Host: biox3
    Directory: uname-20130615-1512-h2fskgme/jobs/a/uname-aan4rzal

Caused by:
    Could not submit job
Caused by:
    Could not start coaster service
Caused by:
    Task ended before registration was received.

Failed to start coaster service
java.lang.NullPointerException
   at java.net.URI.compareTo(libgcj.so.10)
   at java.net.URI.compareTo(libgcj.so.10)
   at java.util.TreeMap.compare(libgcj.so.10)
   at java.util.TreeMap.put(libgcj.so.10)
   at java.util.TreeSet.addAll(libgcj.so.10)
   at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
   at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
java.lang.NullPointerException
   at java.net.URI.compareTo(libgcj.so.10)
   at java.net.URI.compareTo(libgcj.so.10)
   at java.util.TreeMap.compare(libgcj.so.10)
   at java.util.TreeMap.put(libgcj.so.10)
   at java.util.TreeSet.addAll(libgcj.so.10)
   at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
   at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)


    uname, uname.swift, line 12

Finally, here's part of what gets dumped to my log file:

<snip>
2013-06-15 14:54:22,350-0700 INFO  BootstrapService [/171.67.106.68:39309]
GET /coaster-bootstrap.jar HTTP/1.0
2013-06-15 14:54:22,713-0700 INFO  ServiceManager Service task
Task(type=JOB_SUBMISSION, identity=urn:cog-1371333260175) terminated.
Removing service.
2013-06-15 14:54:22,713-0700 INFO  ServiceManager Service does not appear
to be registered with this manager
2013-06-15 14:54:22,713-0700 INFO  ServiceManager Coaster service ended.
Reason: null
        stdout:
        stderr: Failed to start coaster service
java.lang.NullPointerException
   at java.net.URI.compareTo(libgcj.so.10)
   at java.net.URI.compareTo(libgcj.so.10)
   at java.util.TreeMap.compare(libgcj.so.10)
   at java.util.TreeMap.put(libgcj.so.10)
   at java.util.TreeSet.addAll(libgcj.so.10)
   at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
   at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
java.lang.NullPointerException
   at java.net.URI.compareTo(libgcj.so.10)
   at java.net.URI.compareTo(libgcj.so.10)
   at java.util.TreeMap.compare(libgcj.so.10)
   at java.util.TreeMap.put(libgcj.so.10)
   at java.util.TreeSet.addAll(libgcj.so.10)
   at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
   at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)


2013-06-15 14:54:22,714-0700 INFO  NotificationManager biox3.stanford.edu
2013-06-15 14:54:22,771-0700 INFO  RuntimeStats$ProgressTicker   Submitted:1
2013-06-15 14:54:22,775-0700 DEBUG swift APPLICATION_EXCEPTION
jobid=uname-d77eqzal - Application exception: Caused by: Could not submit
job
Caused by: Could not start coaster service
Caused by: Task ended before registration was received.

Failed to start coaster service
java.lang.NullPointerException
   at java.net.URI.compareTo(libgcj.so.10)
   at java.net.URI.compareTo(libgcj.so.10)
   at java.util.TreeMap.compare(libgcj.so.10)
   at java.util.TreeMap.put(libgcj.so.10)
   at java.util.TreeSet.addAll(libgcj.so.10)
   at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
   at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
java.lang.NullPointerException
   at java.net.URI.compareTo(libgcj.so.10)
   at java.net.URI.compareTo(libgcj.so.10)
   at java.util.TreeMap.compare(libgcj.so.10)
   at java.util.TreeMap.put(libgcj.so.10)
   at java.util.TreeSet.addAll(libgcj.so.10)
   at
org.globus.cog.abstraction.coaster.service.job.manager.Settings.setCallbackURIs(Settings.java:403)
   at
org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.<init>(JobQueue.java:41)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.start(CoasterService.java:148)
   at
org.globus.cog.abstraction.coaster.service.CoasterService.main(CoasterService.java:382)
<snip>


Any help or advice on how to resolve this issue, much much appreciated!

Thanks,

TJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20130615/5dfe364f/attachment.html>


More information about the Swift-user mailing list