[Swift-user] running jobs on cluster or cloud

Justin bbt justinbbt at gmail.com
Mon Sep 1 16:20:57 CDT 2014


I solved the permission problem with ssh-add command to add the key to list
of keys.  (This modification is required if the local system is linux- i am
using ubuntu)

(more here
https://help.github.com/articles/error-agent-admitted-failure-to-sign)

Now, start-coaster-service connect to the cluster without password, but it
does not terminate. The is the the output

Service address: localhost
Starting coaster-service
Service port: 35925
Local port: 40681
Generating sites.xml
Starting worker on W.X.Y.Z
WORKER_LOGGING_LEVEL=DEBUG: Command not found.



If I just use my sites.xml

<pool handle="persistent-coasters">
    <execution provider="coaster-persistent"
               url="http:// <http://localhost:37584/>urladdress"
               jobmanager="local:local"/>
    <profile namespace="globus" key="workerManager">passive</profile>
    <profile namespace="globus" key="jobsPerNode">1</profile>
    <profile key="jobThrottle" namespace="karajan">10</profile>
    <profile namespace="karajan" key="initialScore">10000</profile>
    <filesystem provider="local" url="none" />
    <workdirectory>.</workdirectory>
  </pool>


  it fails with the following error


Execution failed:
Exception in simulate:
    Arguments: []
    Host: persistent-coasters
    Directory: p1-20140901-1648-r8mdqbse/jobs/z/simulate-zcyxesvl

Caused by:
Could not submit job
Caused by:
Failed to create socket
Caused by:
Connection refused
simulation, p1.swift, line 9






On Sat, Aug 30, 2014 at 1:28 AM, Justin bbt <justinbbt at gmail.com> wrote:

> For cluster:
>
> When I run the start-caoster-service, I receive the following, in which it
> asks for password and then says Permission is denied
>
> Start-coaster-service...
> Configuration: /home/lenovo/swift-cloud-tutorial/scs/coaster-service.conf
> Service address: localhost
> Starting coaster-service
> Service port: 52809
> Local port: 58460
> Generating sites.xml
> username at ipadress's password:
> username at ipadress's password:
> Starting worker on username@
> lenovo at lenovo-laptop:~/swift-cloud-tutorial/scs$username at ipadress's
> password:
> Permission denied, please try again.
> username at ipadress's password:
> Permission denied, please try again.
> username at ipadress's password:
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
>
> This happens though I have created my keys with ssh-keygen. (only changed
> that I made was to create rsa keys rather than dsa keys - my cluster did
> not accept dsa). I can connect with rsa keygen and my passphrase for
> regular ssh
>
> The output of my sites.xml  from this partial running of
> start-coaster-service is
>
>  <pool handle="persistent-coasters">
>     <execution provider="coaster-persistent"
>                url="http://localhost:37584"
>                jobmanager="local:local"/>
>     <profile namespace="globus" key="workerManager">passive</profile>
>     <profile namespace="globus" key="jobsPerNode">1</profile>
>     <profile key="jobThrottle" namespace="karajan">10</profile>
>     <profile namespace="karajan" key="initialScore">10000</profile>
>     <filesystem provider="local" url="none" />
>     <workdirectory>.</workdirectory>
>   </pool>
>
> Using this XML , I just get a sequence of job submission every 30 seconds,
> no finished jobs.
>
>
> BTW, I have a public ip for my cluster and then each compute node has a
> local/private ip.
> In
>  export WORKER_HOSTS="<IP of machine 1> <IP of machine 2>"
> currently I just set the public IP address which still I am not successful
> with this one node even. I was wondering how should I set the other IPs?
> Does it mean that I have to install swift on the cluster?
>
>
> I will look at the new  release of swift for AWS.
>
>
> Thanks,
> J.
>
>
>
>
>
> On Fri, Aug 29, 2014 at 11:43 AM, Yadu Nand <yadudoc1729 at gmail.com> wrote:
>
>> Hi Justin,
>>
>> ​​Did you do the following steps:
>> export WORKER_LOCATION="/home/ubuntu"
>> export WORKER_HOSTS="<IP of machine 1> <IP of machine 2>"
>> export WORKER_USERNAME=ubuntu
>>
>> and then run "source setup.sh" ?
>> When you source the setup.sh scripts you must've gotten a sites.xml and a
>> start-coaster-service.log in your scs folder, could you send us those ?
>> The setup script should start a persistent coaster service and connect to
>> the nodes on amazon, start workers, and generate a sites.xml file
>> that would let your swift scripts run across the amazon nodes. You
>> shouldn't have to make changes to the sites.xml.
>>
>>  Alternatively, you could try using the beta release of swift, Swift 0.95
>> RC6 with the new cloud mechanism:
>> https://github.com/swift-lang/swift-on-cloud/tree/master/aws
>>
>> That will set you up with a headnode on AWS with a few worker nodes that
>> you define, with everything setup to run swift.
>>
>>
>> Thanks,
>> Yadu
>>>>
>>
>>  On Thu, Aug 28, 2014 at 6:57 PM, Justin bbt <justinbbt at gmail.com> wrote:
>>
>>>
>>>
>>>
>>> Hi all,
>>>>
>>>>  I could successfully run swift on my local system.
>>>> Next, I want to use the swift to run some jobs on a cluster.
>>>>
>>>> I followed this tutorial.  (I am using just a simple cluster- I even
>>>> could not run the job on one remote node of the cluster)
>>>> http://swift-lang.org/tutorials/cloud/tutorial.html
>>>>
>>>> But, I get this when I run swift p1.swift or other swift
>>>>
>>>> Swift 0.94.1 swift-r7114 cog-r3803
>>>>
>>>> RunID: 20140828-1758-ea4phzag
>>>> Progress:  time: Thu, 28 Aug 2014 17:58:15 -0400
>>>> Progress:  time: Thu, 28 Aug 2014 17:58:24 -0400  Submitted:1
>>>> Execution failed:
>>>> Exception in simulate:
>>>>     Arguments: []
>>>>     Host: remotehost2
>>>>     Directory: p1-20140828-1758-ea4phzag/jobs/7/simulate-7k2fxlvl
>>>>
>>>> Caused by:
>>>> Job failed with an exit code of 127
>>>> simulation, p1.swift, line 9
>>>>
>>>>
>>>> --- this is my site.xml file setting
>>>>
>>>>    <pool handle="remotehost2">
>>>>       <execution provider="ssh" jobmanager="ssh:local"
>>>> url="myclusteturl"/>
>>>>       <filesystem provider="ssh" url="myclusteturl"/>
>>>>       <profile namespace="karajan" key="jobThrottle">0</profile>
>>>>       <profile namespace="karajan" key="initialScore">10000</profile>
>>>>       <workdirectory>/path/to/remote/workdirectory</workdirectory>
>>>>    </pool>
>>>>
>>>> --- if I use this one
>>>> <pool handle="persistent-coasters">
>>>>     <execution provider="coaster-persistent"
>>>>                url="myclusterurl"
>>>>                jobmanager="local:local"/>
>>>>     <profile namespace="globus" key="workerManager">passive</profile>
>>>>     <profile namespace="globus" key="jobsPerNode">1</profile>
>>>>     <profile key="jobThrottle" namespace="karajan">10</profile>
>>>>     <profile namespace="karajan" key="initialScore">10000</profile>
>>>>     <filesystem provider="local" url="none" />
>>>>     <workdirectory>.l</workdirectory>
>>>>   </pool>
>>>> --- then it loops to my localhost and just repeat submitting the jobs
>>>>
>>>> 1. Is this a correct setting?
>>>> 2. Should I use coaster? I could not understand the description in user
>>>> guides and documentation about the concepts of coaster and the required
>>>> setting. Is there any better tutorial which would describe the coaster ?
>>>> 3. I plan to use the swift later on the cloud (Microsoft Azure). What
>>>> are the setting required for that? for site.xml and if any other file
>>>>
>>>>
>>>> Thanks in Advance.
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>>
>>
>>
>>
>> --
>> Yadu Nand B
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140901/79fe5dd4/attachment.html>


More information about the Swift-user mailing list