[Swift-user] running jobs on cluster or cloud
Justin bbt
justinbbt at gmail.com
Tue Sep 2 13:26:28 CDT 2014
I re-run my coaster to make sure I am sending you an updated log. Log is
attached. The sites.xml is this now
<config>
<pool handle="persistent-coasters">
<execution provider="coaster-persistent"
url="http://localhost:53346"
jobmanager="local:local"/>
<profile namespace="globus" key="workerManager">passive</profile>
<profile namespace="globus" key="jobsPerNode">1</profile>
<profile key="jobThrottle" namespace="karajan">10</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<filesystem provider="local" url="none" />
<workdirectory>/scratchspace</workdirectory>
</pool>
</config>
The first question is why the start-coaster-service does not terminate?
Anyhow, if I use this sites.xml, then swift output is
lenovo at lenovo-laptop:~/swift-cloud-tutorial/part01$ swift p1.swift
Swift 0.94.1 swift-r7114 cog-r3803
RunID: 20140902-1406-he5yo1s3
Progress: time: Tue, 02 Sep 2014 14:06:50 -0400
Progress: time: Tue, 02 Sep 2014 14:07:20 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:07:50 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:08:20 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:08:50 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:09:20 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:09:50 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:10:20 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:10:50 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:11:20 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:11:50 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:12:20 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:12:50 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:13:20 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:13:50 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:14:20 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:14:50 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:15:20 -0400 Submitted:1
Progress: time: Tue, 02 Sep 2014 14:15:50 -0400 Submitted:1
I am also attaching the log file for that.
On Mon, Sep 1, 2014 at 5:20 PM, Justin bbt <justinbbt at gmail.com> wrote:
> I solved the permission problem with ssh-add command to add the key to
> list of keys. (This modification is required if the local system is linux-
> i am using ubuntu)
>
> (more here
> https://help.github.com/articles/error-agent-admitted-failure-to-sign)
>
> Now, start-coaster-service connect to the cluster without password, but it
> does not terminate. The is the the output
>
> Service address: localhost
> Starting coaster-service
> Service port: 35925
> Local port: 40681
> Generating sites.xml
> Starting worker on W.X.Y.Z
> WORKER_LOGGING_LEVEL=DEBUG: Command not found.
>
>
>
> If I just use my sites.xml
>
> <pool handle="persistent-coasters">
> <execution provider="coaster-persistent"
> url="http:// <http://localhost:37584/>urladdress"
> jobmanager="local:local"/>
> <profile namespace="globus" key="workerManager">passive</profile>
> <profile namespace="globus" key="jobsPerNode">1</profile>
> <profile key="jobThrottle" namespace="karajan">10</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> <filesystem provider="local" url="none" />
> <workdirectory>.</workdirectory>
> </pool>
>
>
> it fails with the following error
>
>
> Execution failed:
> Exception in simulate:
> Arguments: []
> Host: persistent-coasters
> Directory: p1-20140901-1648-r8mdqbse/jobs/z/simulate-zcyxesvl
>
> Caused by:
> Could not submit job
> Caused by:
> Failed to create socket
> Caused by:
> Connection refused
> simulation, p1.swift, line 9
>
>
>
>
>
>
> On Sat, Aug 30, 2014 at 1:28 AM, Justin bbt <justinbbt at gmail.com> wrote:
>
>> For cluster:
>>
>> When I run the start-caoster-service, I receive the following, in which
>> it asks for password and then says Permission is denied
>>
>> Start-coaster-service...
>> Configuration: /home/lenovo/swift-cloud-tutorial/scs/coaster-service.conf
>> Service address: localhost
>> Starting coaster-service
>> Service port: 52809
>> Local port: 58460
>> Generating sites.xml
>> username at ipadress's password:
>> username at ipadress's password:
>> Starting worker on username@
>> lenovo at lenovo-laptop:~/swift-cloud-tutorial/scs$username at ipadress's
>> password:
>> Permission denied, please try again.
>> username at ipadress's password:
>> Permission denied, please try again.
>> username at ipadress's password:
>> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
>>
>> This happens though I have created my keys with ssh-keygen. (only changed
>> that I made was to create rsa keys rather than dsa keys - my cluster did
>> not accept dsa). I can connect with rsa keygen and my passphrase for
>> regular ssh
>>
>> The output of my sites.xml from this partial running of
>> start-coaster-service is
>>
>> <pool handle="persistent-coasters">
>> <execution provider="coaster-persistent"
>> url="http://localhost:37584"
>> jobmanager="local:local"/>
>> <profile namespace="globus" key="workerManager">passive</profile>
>> <profile namespace="globus" key="jobsPerNode">1</profile>
>> <profile key="jobThrottle" namespace="karajan">10</profile>
>> <profile namespace="karajan" key="initialScore">10000</profile>
>> <filesystem provider="local" url="none" />
>> <workdirectory>.</workdirectory>
>> </pool>
>>
>> Using this XML , I just get a sequence of job submission every 30
>> seconds, no finished jobs.
>>
>>
>> BTW, I have a public ip for my cluster and then each compute node has a
>> local/private ip.
>> In
>> export WORKER_HOSTS="<IP of machine 1> <IP of machine 2>"
>> currently I just set the public IP address which still I am not
>> successful with this one node even. I was wondering how should I set the
>> other IPs? Does it mean that I have to install swift on the cluster?
>>
>>
>> I will look at the new release of swift for AWS.
>>
>>
>> Thanks,
>> J.
>>
>>
>>
>>
>>
>> On Fri, Aug 29, 2014 at 11:43 AM, Yadu Nand <yadudoc1729 at gmail.com>
>> wrote:
>>
>>> Hi Justin,
>>>
>>> Did you do the following steps:
>>> export WORKER_LOCATION="/home/ubuntu"
>>> export WORKER_HOSTS="<IP of machine 1> <IP of machine 2>"
>>> export WORKER_USERNAME=ubuntu
>>>
>>> and then run "source setup.sh" ?
>>> When you source the setup.sh scripts you must've gotten a sites.xml and
>>> a start-coaster-service.log in your scs folder, could you send us those ?
>>> The setup script should start a persistent coaster service and connect
>>> to the nodes on amazon, start workers, and generate a sites.xml file
>>> that would let your swift scripts run across the amazon nodes. You
>>> shouldn't have to make changes to the sites.xml.
>>>
>>> Alternatively, you could try using the beta release of swift, Swift
>>> 0.95 RC6 with the new cloud mechanism:
>>> https://github.com/swift-lang/swift-on-cloud/tree/master/aws
>>>
>>> That will set you up with a headnode on AWS with a few worker nodes that
>>> you define, with everything setup to run swift.
>>>
>>>
>>> Thanks,
>>> Yadu
>>>
>>>
>>>
>>> On Thu, Aug 28, 2014 at 6:57 PM, Justin bbt <justinbbt at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>>
>>>> Hi all,
>>>>>
>>>>> I could successfully run swift on my local system.
>>>>> Next, I want to use the swift to run some jobs on a cluster.
>>>>>
>>>>> I followed this tutorial. (I am using just a simple cluster- I even
>>>>> could not run the job on one remote node of the cluster)
>>>>> http://swift-lang.org/tutorials/cloud/tutorial.html
>>>>>
>>>>> But, I get this when I run swift p1.swift or other swift
>>>>>
>>>>> Swift 0.94.1 swift-r7114 cog-r3803
>>>>>
>>>>> RunID: 20140828-1758-ea4phzag
>>>>> Progress: time: Thu, 28 Aug 2014 17:58:15 -0400
>>>>> Progress: time: Thu, 28 Aug 2014 17:58:24 -0400 Submitted:1
>>>>> Execution failed:
>>>>> Exception in simulate:
>>>>> Arguments: []
>>>>> Host: remotehost2
>>>>> Directory: p1-20140828-1758-ea4phzag/jobs/7/simulate-7k2fxlvl
>>>>>
>>>>> Caused by:
>>>>> Job failed with an exit code of 127
>>>>> simulation, p1.swift, line 9
>>>>>
>>>>>
>>>>> --- this is my site.xml file setting
>>>>>
>>>>> <pool handle="remotehost2">
>>>>> <execution provider="ssh" jobmanager="ssh:local"
>>>>> url="myclusteturl"/>
>>>>> <filesystem provider="ssh" url="myclusteturl"/>
>>>>> <profile namespace="karajan" key="jobThrottle">0</profile>
>>>>> <profile namespace="karajan" key="initialScore">10000</profile>
>>>>> <workdirectory>/path/to/remote/workdirectory</workdirectory>
>>>>> </pool>
>>>>>
>>>>> --- if I use this one
>>>>> <pool handle="persistent-coasters">
>>>>> <execution provider="coaster-persistent"
>>>>> url="myclusterurl"
>>>>> jobmanager="local:local"/>
>>>>> <profile namespace="globus" key="workerManager">passive</profile>
>>>>> <profile namespace="globus" key="jobsPerNode">1</profile>
>>>>> <profile key="jobThrottle" namespace="karajan">10</profile>
>>>>> <profile namespace="karajan" key="initialScore">10000</profile>
>>>>> <filesystem provider="local" url="none" />
>>>>> <workdirectory>.l</workdirectory>
>>>>> </pool>
>>>>> --- then it loops to my localhost and just repeat submitting the jobs
>>>>>
>>>>> 1. Is this a correct setting?
>>>>> 2. Should I use coaster? I could not understand the description in
>>>>> user guides and documentation about the concepts of coaster and the
>>>>> required setting. Is there any better tutorial which would describe the
>>>>> coaster ?
>>>>> 3. I plan to use the swift later on the cloud (Microsoft Azure). What
>>>>> are the setting required for that? for site.xml and if any other file
>>>>>
>>>>>
>>>>> Thanks in Advance.
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Swift-user mailing list
>>>> Swift-user at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>>>
>>>
>>>
>>>
>>> --
>>> Yadu Nand B
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140902/eeb0f3fb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jobspernode-20140902-1401-nux8gdl0.log
Type: text/x-log
Size: 14003 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140902/eeb0f3fb/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p1-20140902-1406-he5yo1s3.log
Type: text/x-log
Size: 10845 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140902/eeb0f3fb/attachment-0001.bin>
More information about the Swift-user
mailing list