[Swift-user] Fwd: problem with running coaster

Justin bbt justinbbt at gmail.com
Mon Sep 22 12:22:12 CDT 2014


---------- Forwarded message ----------
From: Justin bbt <justinbbt at gmail.com>
Date: Fri, Sep 19, 2014 at 9:48 PM
Subject: problem with running coaster
To: Swift User <swift-user at ci.uchicago.edu>
Cc: Yadu Nand <yadudoc1729 at gmail.com>


Hi all

I have a problem in running swift over the cloud of Microsoft Azure. I have
two nodes on the cloud. I installed swift on one of them and I can run
part01-03 locally successfully. I am trying to use the coaster to run
part04-part-6 on the other node. Here is the output I get for "swift
p4.swift"


Swift 0.95 RC6 swift-r7900 cog-r3908
RunID: run005
Warning: The @ syntax for function invocation is deprecated
Progress: Sat, 20 Sep 2014 01:36:02+0000
Exception in thread "Scheduler" java.lang.NullPointerException
at
org.globus.cog.abstraction.impl.common.task.TaskImpl.hashCode(TaskImpl.java:364)
at java.util.HashMap.hash(HashMap.java:338)
at java.util.HashMap.get(HashMap.java:556)
at
org.griphyn.vdl.karajan.VDSAdaptiveScheduler.failTask(VDSAdaptiveScheduler.java:400)
at
org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:266)
Progress: Sat, 20 Sep 2014 01:36:03+0000  Selecting site:10
No events in 1s.
Finding dependency loops...

Waiting threads:
Thread: R-6, waiting on sims (declared on line 21)
swift:execute, p4, line 96
analyze, p4, line 211

Thread: R-5-2-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-7-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-8-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-9-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-3x2, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-0-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-1-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-6-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-4-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5x2-3, waiting on simout (declared on line 24)
assignment, p4, line 28

----
No dependency loops found.

The following threads are independently hung:
Thread: R-6, waiting on sims (declared on line 21)
swift:execute, p4, line 96
analyze, p4, line 211

----

Irrecoverable error found. Exiting.





This is the content of my coaster-service.conf

export WORKER_LOCATION=.
export WORKER_HOSTS="191.238.1.187"
export WORKER_MODE=ssh
export WORKER_USERNAME=azureuser
export IPADDR=191.238.1.33
export WORKER_LOG_DIR=.
export WORK=/home/ubuntu/work
export JOBSPERNODE=1
export JOBTHROTTLE=10
export SSH_TUNNELING=yes


And here is the content of start-coaster-service.log


Running /home/azureuser/swift-0.95-RC6/bin/coaster-service -nosec -portfile
/tmp/tmp.8WkG0mKKzo -localportfile /tmp/tmp.hZvsxQd5J1 -passive
Switching log to: cps-2014-09-20_01-21-41.log
2014-09-20 01:21:41,305+0000 WARN  CoasterPersistentService Switching log
to: cps-2014-09-20_01-21-41.log
Local contacts: [http://100.74.60.3:55071]
2014-09-20 01:21:41,338+0000 INFO  Settings Local contacts: [
http://100.74.60.3:55071]
Starting... id=0920-2101410
2014-09-20 01:21:41,350+0000 INFO  BlockQueueProcessor Starting...
id=0920-2101410
Started local service: http://100.74.60.3:55071
2014-09-20 01:21:41,350+0000 INFO  CoasterService Started local service:
http://100.74.60.3:55071
Started coaster service: http://100.74.60.3:42972
2014-09-20 01:21:41,351+0000 INFO  CoasterService Started coaster service:
http://100.74.60.3:42972
Started coaster service: http://100.74.60.3:42972
Worker connection URL: http://100.74.60.3:55071
Running ssh -N -T -R *:55071:localhost:55071 azureuser at 191.238.1.187
Running ssh azureuser at 191.238.1.187 mkdir -p . && mkdir -p .
Running scp /home/azureuser/swift-0.95-RC6/bin/worker.pl
azureuser at 191.238.1.187:.
Running ssh azureuser at 191.238.1.187 WORKER_LOGGING_LEVEL= nohup ./worker.pl
http://191.238.1.33:55071 191.238.1.187 . &> /dev/null &
HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768
2014-09-20 01:21:51,356+0000 INFO  CoasterService HeapMax: 13134987264,
CrtHeap: 886571008, UsedHeap: 41712768
HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768
2014-09-20 01:22:01,364+0000 INFO  CoasterService HeapMax: 13134987264,
CrtHeap: 886571008, UsedHeap: 41712768


I had this problem before with my laptop connecting to a remote server and Yadu
Nand Babuji  told me it is because I don't have  a public ip.
But, I have the problem again now with my nodes in the cloud having the
public ip.
Though as  can be seen in the log, during the connection coaster uses the
private ip
(100.74.60.3)

Is this really a problem with public IP?
Does anybody know how to solve this problem ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140922/2d55cf0e/attachment.html>


More information about the Swift-user mailing list