[Swift-user] running jobs on cluster or cloud

Yadu Nand yadudoc1729 at gmail.com
Tue Sep 2 12:13:45 CDT 2014


Hi Justin,

The sites.xml generated by start-coaster-service has to be used by swift,
so that swift can connect to the coaster service.
I do not understand if you are using the generated sites.xml, or some other
sites.xml file that you wrote. In the case where
coaster-service output says "Service port: 35925", the url in the sites.xml
should be http://localhost:35925.

Could you give me a paste of the sites.xml file that was generated by
start-coaster-service, the start-coaster-service.log,
and the logs from swift when you attempted to run a swift script.

Thanks,
Yadu






On Mon, Sep 1, 2014 at 4:20 PM, Justin bbt <justinbbt at gmail.com> wrote:

> I solved the permission problem with ssh-add command to add the key to
> list of keys.  (This modification is required if the local system is linux-
> i am using ubuntu)
>
> (more here
> https://help.github.com/articles/error-agent-admitted-failure-to-sign)
>
> Now, start-coaster-service connect to the cluster without password, but it
> does not terminate. The is the the output
>
> Service address: localhost
> Starting coaster-service
> Service port: 35925
> Local port: 40681
> Generating sites.xml
> Starting worker on W.X.Y.Z
> WORKER_LOGGING_LEVEL=DEBUG: Command not found.
>
>
>
> If I just use my sites.xml
>
> <pool handle="persistent-coasters">
>      <execution provider="coaster-persistent"
>                url="http:// <http://localhost:37584/>urladdress"
>                 jobmanager="local:local"/>
>     <profile namespace="globus" key="workerManager">passive</profile>
>     <profile namespace="globus" key="jobsPerNode">1</profile>
>     <profile key="jobThrottle" namespace="karajan">10</profile>
>     <profile namespace="karajan" key="initialScore">10000</profile>
>     <filesystem provider="local" url="none" />
>     <workdirectory>.</workdirectory>
>   </pool>
>
>
>   it fails with the following error
>
>
> Execution failed:
> Exception in simulate:
>     Arguments: []
>     Host: persistent-coasters
>     Directory: p1-20140901-1648-r8mdqbse/jobs/z/simulate-zcyxesvl
>
> Caused by:
> Could not submit job
> Caused by:
> Failed to create socket
> Caused by:
> Connection refused
> simulation, p1.swift, line 9
>
>
>
>
>
>
> On Sat, Aug 30, 2014 at 1:28 AM, Justin bbt <justinbbt at gmail.com> wrote:
>
>> For cluster:
>>
>> When I run the start-caoster-service, I receive the following, in which
>> it asks for password and then says Permission is denied
>>
>> Start-coaster-service...
>> Configuration: /home/lenovo/swift-cloud-tutorial/scs/coaster-service.conf
>> Service address: localhost
>> Starting coaster-service
>> Service port: 52809
>> Local port: 58460
>> Generating sites.xml
>> username at ipadress's password:
>> username at ipadress's password:
>> Starting worker on username@
>> lenovo at lenovo-laptop:~/swift-cloud-tutorial/scs$username at ipadress's
>> password:
>> Permission denied, please try again.
>> username at ipadress's password:
>> Permission denied, please try again.
>> username at ipadress's password:
>> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
>>
>> This happens though I have created my keys with ssh-keygen. (only changed
>> that I made was to create rsa keys rather than dsa keys - my cluster did
>> not accept dsa). I can connect with rsa keygen and my passphrase for
>> regular ssh
>>
>> The output of my sites.xml  from this partial running of
>> start-coaster-service is
>>
>>  <pool handle="persistent-coasters">
>>     <execution provider="coaster-persistent"
>>                url="http://localhost:37584"
>>                jobmanager="local:local"/>
>>     <profile namespace="globus" key="workerManager">passive</profile>
>>     <profile namespace="globus" key="jobsPerNode">1</profile>
>>     <profile key="jobThrottle" namespace="karajan">10</profile>
>>     <profile namespace="karajan" key="initialScore">10000</profile>
>>     <filesystem provider="local" url="none" />
>>     <workdirectory>.</workdirectory>
>>   </pool>
>>
>> Using this XML , I just get a sequence of job submission every 30
>> seconds, no finished jobs.
>>
>>
>> BTW, I have a public ip for my cluster and then each compute node has a
>> local/private ip.
>> In
>>  export WORKER_HOSTS="<IP of machine 1> <IP of machine 2>"
>> currently I just set the public IP address which still I am not
>> successful with this one node even. I was wondering how should I set the
>> other IPs? Does it mean that I have to install swift on the cluster?
>>
>>
>> I will look at the new  release of swift for AWS.
>>
>>
>> Thanks,
>> J.
>>
>>
>>
>>
>>
>> On Fri, Aug 29, 2014 at 11:43 AM, Yadu Nand <yadudoc1729 at gmail.com>
>> wrote:
>>
>>> Hi Justin,
>>>
>>> ​​Did you do the following steps:
>>> export WORKER_LOCATION="/home/ubuntu"
>>> export WORKER_HOSTS="<IP of machine 1> <IP of machine 2>"
>>> export WORKER_USERNAME=ubuntu
>>>
>>> and then run "source setup.sh" ?
>>> When you source the setup.sh scripts you must've gotten a sites.xml and
>>> a start-coaster-service.log in your scs folder, could you send us those ?
>>> The setup script should start a persistent coaster service and connect
>>> to the nodes on amazon, start workers, and generate a sites.xml file
>>> that would let your swift scripts run across the amazon nodes. You
>>> shouldn't have to make changes to the sites.xml.
>>>
>>>  Alternatively, you could try using the beta release of swift, Swift
>>> 0.95 RC6 with the new cloud mechanism:
>>> https://github.com/swift-lang/swift-on-cloud/tree/master/aws
>>>
>>> That will set you up with a headnode on AWS with a few worker nodes that
>>> you define, with everything setup to run swift.
>>>
>>>
>>> Thanks,
>>> Yadu
>>>>>>
>>>
>>>  On Thu, Aug 28, 2014 at 6:57 PM, Justin bbt <justinbbt at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>>
>>>> Hi all,
>>>>>
>>>>>  I could successfully run swift on my local system.
>>>>> Next, I want to use the swift to run some jobs on a cluster.
>>>>>
>>>>> I followed this tutorial.  (I am using just a simple cluster- I even
>>>>> could not run the job on one remote node of the cluster)
>>>>> http://swift-lang.org/tutorials/cloud/tutorial.html
>>>>>
>>>>> But, I get this when I run swift p1.swift or other swift
>>>>>
>>>>> Swift 0.94.1 swift-r7114 cog-r3803
>>>>>
>>>>> RunID: 20140828-1758-ea4phzag
>>>>> Progress:  time: Thu, 28 Aug 2014 17:58:15 -0400
>>>>> Progress:  time: Thu, 28 Aug 2014 17:58:24 -0400  Submitted:1
>>>>> Execution failed:
>>>>> Exception in simulate:
>>>>>     Arguments: []
>>>>>     Host: remotehost2
>>>>>     Directory: p1-20140828-1758-ea4phzag/jobs/7/simulate-7k2fxlvl
>>>>>
>>>>> Caused by:
>>>>> Job failed with an exit code of 127
>>>>> simulation, p1.swift, line 9
>>>>>
>>>>>
>>>>> --- this is my site.xml file setting
>>>>>
>>>>>    <pool handle="remotehost2">
>>>>>       <execution provider="ssh" jobmanager="ssh:local"
>>>>> url="myclusteturl"/>
>>>>>       <filesystem provider="ssh" url="myclusteturl"/>
>>>>>       <profile namespace="karajan" key="jobThrottle">0</profile>
>>>>>       <profile namespace="karajan" key="initialScore">10000</profile>
>>>>>       <workdirectory>/path/to/remote/workdirectory</workdirectory>
>>>>>    </pool>
>>>>>
>>>>> --- if I use this one
>>>>> <pool handle="persistent-coasters">
>>>>>     <execution provider="coaster-persistent"
>>>>>                url="myclusterurl"
>>>>>                jobmanager="local:local"/>
>>>>>     <profile namespace="globus" key="workerManager">passive</profile>
>>>>>     <profile namespace="globus" key="jobsPerNode">1</profile>
>>>>>     <profile key="jobThrottle" namespace="karajan">10</profile>
>>>>>     <profile namespace="karajan" key="initialScore">10000</profile>
>>>>>     <filesystem provider="local" url="none" />
>>>>>     <workdirectory>.l</workdirectory>
>>>>>   </pool>
>>>>> --- then it loops to my localhost and just repeat submitting the jobs
>>>>>
>>>>> 1. Is this a correct setting?
>>>>> 2. Should I use coaster? I could not understand the description in
>>>>> user guides and documentation about the concepts of coaster and the
>>>>> required setting. Is there any better tutorial which would describe the
>>>>> coaster ?
>>>>> 3. I plan to use the swift later on the cloud (Microsoft Azure). What
>>>>> are the setting required for that? for site.xml and if any other file
>>>>>
>>>>>
>>>>> Thanks in Advance.
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Swift-user mailing list
>>>> Swift-user at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>>>
>>>
>>>
>>>
>>> --
>>> Yadu Nand B
>>>
>>>
>>
>


-- 
Yadu Nand B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140902/c388eceb/attachment.html>


More information about the Swift-user mailing list