[Swift-user] running swift k on multiple instances on EC2, using coaster-service

David Kelly davidk at ci.uchicago.edu
Wed Oct 24 00:47:01 CDT 2012


Hello Iman,

Could you please explain how you are trying to set up coasters? Are you manually running worker.pl on each node and pointing it to the address/port, or using a script like start-coaster-service? Is coaster-service running on the same machine as swift? Are you setting up the instances with any firewall rules in place? 

Could you please send the sites.xml you're using, and the log file from hello.swift (or a link to it, if it's too large to attach?) Thanks!

Regards,
David

----- Original Message -----
> From: "Iman Sadooghi" <isadoogh at iit.edu>
> To: swift-user at ci.uchicago.edu
> Sent: Tuesday, October 23, 2012 5:46:57 PM
> Subject: [Swift-user] running swift k on multiple instances on EC2, using coaster-service
> Hi everyone
> 
> 
> I am trying to run a Montage application workflow with swift on
> multiple instances of AMAZON EC2.
> So far I was able to set up a cluster, and a PVFS files system shared
> among the nodes ( using FUSE. so I will have POSIX interface on my
> swift work directory ).
> I have tried running a simple hello.swift example on multiple nodes
> with the coaster. the working directory is the shared folder
> (supported by PVFS).
> when I run the code using my own tc.data and sites.xml, this will
> happen:
> 
> 
> 
> (my command) ubuntu at ip-10-244-4-101:~/coaster$ swift -tc.file tc.data
> -sites.file sites.xml ~/swift-0.93/examples/swift/tutorial/hello.swift
> (results:)
> Swift 0.93 swift-r5483 cog-r3339
> 
> 
> RunID: 20121023-2200-4d3knr72
> Progress: time: Tue, 23 Oct 2012 22:00:50 +0000
> Find: http://10.244.4.101:1213
> Find: keepalive(120), reconnect - http://10.244.4.101:1213
> Passive queue processor initialized. Callback URI is
> http://10.244.4.101:1212
> Progress: time: Tue, 23 Oct 2012 22:01:20 +0000 Submitted:1
> Progress: time: Tue, 23 Oct 2012 22:01:50 +0000 Submitted:1
> Progress: time: Tue, 23 Oct 2012 22:02:20 +0000 Submitted:1
> Progress: time: Tue, 23 Oct 2012 22:02:50 +0000 Submitted:1
> Progress: time: Tue, 23 Oct 2012 22:03:20 +0000 Submitted:1
> Progress: time: Tue, 23 Oct 2012 22:03:50 +0000 Submitted:1
> Progress: time: Tue, 23 Oct 2012 22:04:20 +0000 Submitted:1
> 
> 
> and it keeps doing this forever meaning that there is no answer from
> worker nodes!
> as I checked on worker nodes, the working files are created on the
> shared folder, and when i check the running applications, there is a
> java application running. but nothing happens.
> I have also attached the log file of my hello.swift running in case
> you need to take a look at it.
> should I consider using pbs, or condor,... I have no idea about how
> they work though.
> 
> 
> I appreciate if anyone can help me with it. Thank you so much.
> 
> Best, --
> Iman Sadooghi
> Illinois Institute of Technology (IIT)
> Data-Intensive Distributed Systems Laboratory
> 
> 
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user



More information about the Swift-user mailing list