[Swift-user] running jobs on cluster or cloud

Justin bbt justinbbt at gmail.com
Tue Sep 2 13:26:28 CDT 2014


I re-run my coaster to make sure I am sending you an updated log. Log is
attached. The sites.xml is this now

<config>
  <pool handle="persistent-coasters">
    <execution provider="coaster-persistent"
               url="http://localhost:53346"
               jobmanager="local:local"/>
    <profile namespace="globus" key="workerManager">passive</profile>
    <profile namespace="globus" key="jobsPerNode">1</profile>
    <profile key="jobThrottle" namespace="karajan">10</profile>
    <profile namespace="karajan" key="initialScore">10000</profile>
    <filesystem provider="local" url="none" />
    <workdirectory>/scratchspace</workdirectory>
  </pool>
</config>


The first question is why the start-coaster-service does not terminate?

Anyhow, if I use this sites.xml, then swift output is

lenovo at lenovo-laptop:~/swift-cloud-tutorial/part01$ swift p1.swift
Swift 0.94.1 swift-r7114 cog-r3803

RunID: 20140902-1406-he5yo1s3
Progress:  time: Tue, 02 Sep 2014 14:06:50 -0400
Progress:  time: Tue, 02 Sep 2014 14:07:20 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:07:50 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:08:20 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:08:50 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:09:20 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:09:50 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:10:20 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:10:50 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:11:20 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:11:50 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:12:20 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:12:50 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:13:20 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:13:50 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:14:20 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:14:50 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:15:20 -0400  Submitted:1
Progress:  time: Tue, 02 Sep 2014 14:15:50 -0400  Submitted:1


I am also attaching the log file for that.




On Mon, Sep 1, 2014 at 5:20 PM, Justin bbt <justinbbt at gmail.com> wrote:

> I solved the permission problem with ssh-add command to add the key to
> list of keys.  (This modification is required if the local system is linux-
> i am using ubuntu)
>
> (more here
> https://help.github.com/articles/error-agent-admitted-failure-to-sign)
>
> Now, start-coaster-service connect to the cluster without password, but it
> does not terminate. The is the the output
>
> Service address: localhost
> Starting coaster-service
> Service port: 35925
> Local port: 40681
> Generating sites.xml
> Starting worker on W.X.Y.Z
> WORKER_LOGGING_LEVEL=DEBUG: Command not found.
>
>
>
> If I just use my sites.xml
>
> <pool handle="persistent-coasters">
>     <execution provider="coaster-persistent"
>                url="http:// <http://localhost:37584/>urladdress"
>                jobmanager="local:local"/>
>     <profile namespace="globus" key="workerManager">passive</profile>
>     <profile namespace="globus" key="jobsPerNode">1</profile>
>     <profile key="jobThrottle" namespace="karajan">10</profile>
>     <profile namespace="karajan" key="initialScore">10000</profile>
>     <filesystem provider="local" url="none" />
>     <workdirectory>.</workdirectory>
>   </pool>
>
>
>   it fails with the following error
>
>
> Execution failed:
> Exception in simulate:
>     Arguments: []
>     Host: persistent-coasters
>     Directory: p1-20140901-1648-r8mdqbse/jobs/z/simulate-zcyxesvl
>
> Caused by:
> Could not submit job
> Caused by:
> Failed to create socket
> Caused by:
> Connection refused
> simulation, p1.swift, line 9
>
>
>
>
>
>
> On Sat, Aug 30, 2014 at 1:28 AM, Justin bbt <justinbbt at gmail.com> wrote:
>
>> For cluster:
>>
>> When I run the start-caoster-service, I receive the following, in which
>> it asks for password and then says Permission is denied
>>
>> Start-coaster-service...
>> Configuration: /home/lenovo/swift-cloud-tutorial/scs/coaster-service.conf
>> Service address: localhost
>> Starting coaster-service
>> Service port: 52809
>> Local port: 58460
>> Generating sites.xml
>> username at ipadress's password:
>> username at ipadress's password:
>> Starting worker on username@
>> lenovo at lenovo-laptop:~/swift-cloud-tutorial/scs$username at ipadress's
>> password:
>> Permission denied, please try again.
>> username at ipadress's password:
>> Permission denied, please try again.
>> username at ipadress's password:
>> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
>>
>> This happens though I have created my keys with ssh-keygen. (only changed
>> that I made was to create rsa keys rather than dsa keys - my cluster did
>> not accept dsa). I can connect with rsa keygen and my passphrase for
>> regular ssh
>>
>> The output of my sites.xml  from this partial running of
>> start-coaster-service is
>>
>>  <pool handle="persistent-coasters">
>>     <execution provider="coaster-persistent"
>>                url="http://localhost:37584"
>>                jobmanager="local:local"/>
>>     <profile namespace="globus" key="workerManager">passive</profile>
>>     <profile namespace="globus" key="jobsPerNode">1</profile>
>>     <profile key="jobThrottle" namespace="karajan">10</profile>
>>     <profile namespace="karajan" key="initialScore">10000</profile>
>>     <filesystem provider="local" url="none" />
>>     <workdirectory>.</workdirectory>
>>   </pool>
>>
>> Using this XML , I just get a sequence of job submission every 30
>> seconds, no finished jobs.
>>
>>
>> BTW, I have a public ip for my cluster and then each compute node has a
>> local/private ip.
>> In
>>  export WORKER_HOSTS="<IP of machine 1> <IP of machine 2>"
>> currently I just set the public IP address which still I am not
>> successful with this one node even. I was wondering how should I set the
>> other IPs? Does it mean that I have to install swift on the cluster?
>>
>>
>> I will look at the new  release of swift for AWS.
>>
>>
>> Thanks,
>> J.
>>
>>
>>
>>
>>
>> On Fri, Aug 29, 2014 at 11:43 AM, Yadu Nand <yadudoc1729 at gmail.com>
>> wrote:
>>
>>> Hi Justin,
>>>
>>> ​​Did you do the following steps:
>>> export WORKER_LOCATION="/home/ubuntu"
>>> export WORKER_HOSTS="<IP of machine 1> <IP of machine 2>"
>>> export WORKER_USERNAME=ubuntu
>>>
>>> and then run "source setup.sh" ?
>>> When you source the setup.sh scripts you must've gotten a sites.xml and
>>> a start-coaster-service.log in your scs folder, could you send us those ?
>>> The setup script should start a persistent coaster service and connect
>>> to the nodes on amazon, start workers, and generate a sites.xml file
>>> that would let your swift scripts run across the amazon nodes. You
>>> shouldn't have to make changes to the sites.xml.
>>>
>>>  Alternatively, you could try using the beta release of swift, Swift
>>> 0.95 RC6 with the new cloud mechanism:
>>> https://github.com/swift-lang/swift-on-cloud/tree/master/aws
>>>
>>> That will set you up with a headnode on AWS with a few worker nodes that
>>> you define, with everything setup to run swift.
>>>
>>>
>>> Thanks,
>>> Yadu
>>>>>>
>>>
>>>  On Thu, Aug 28, 2014 at 6:57 PM, Justin bbt <justinbbt at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>>
>>>> Hi all,
>>>>>
>>>>>  I could successfully run swift on my local system.
>>>>> Next, I want to use the swift to run some jobs on a cluster.
>>>>>
>>>>> I followed this tutorial.  (I am using just a simple cluster- I even
>>>>> could not run the job on one remote node of the cluster)
>>>>> http://swift-lang.org/tutorials/cloud/tutorial.html
>>>>>
>>>>> But, I get this when I run swift p1.swift or other swift
>>>>>
>>>>> Swift 0.94.1 swift-r7114 cog-r3803
>>>>>
>>>>> RunID: 20140828-1758-ea4phzag
>>>>> Progress:  time: Thu, 28 Aug 2014 17:58:15 -0400
>>>>> Progress:  time: Thu, 28 Aug 2014 17:58:24 -0400  Submitted:1
>>>>> Execution failed:
>>>>> Exception in simulate:
>>>>>     Arguments: []
>>>>>     Host: remotehost2
>>>>>     Directory: p1-20140828-1758-ea4phzag/jobs/7/simulate-7k2fxlvl
>>>>>
>>>>> Caused by:
>>>>> Job failed with an exit code of 127
>>>>> simulation, p1.swift, line 9
>>>>>
>>>>>
>>>>> --- this is my site.xml file setting
>>>>>
>>>>>    <pool handle="remotehost2">
>>>>>       <execution provider="ssh" jobmanager="ssh:local"
>>>>> url="myclusteturl"/>
>>>>>       <filesystem provider="ssh" url="myclusteturl"/>
>>>>>       <profile namespace="karajan" key="jobThrottle">0</profile>
>>>>>       <profile namespace="karajan" key="initialScore">10000</profile>
>>>>>       <workdirectory>/path/to/remote/workdirectory</workdirectory>
>>>>>    </pool>
>>>>>
>>>>> --- if I use this one
>>>>> <pool handle="persistent-coasters">
>>>>>     <execution provider="coaster-persistent"
>>>>>                url="myclusterurl"
>>>>>                jobmanager="local:local"/>
>>>>>     <profile namespace="globus" key="workerManager">passive</profile>
>>>>>     <profile namespace="globus" key="jobsPerNode">1</profile>
>>>>>     <profile key="jobThrottle" namespace="karajan">10</profile>
>>>>>     <profile namespace="karajan" key="initialScore">10000</profile>
>>>>>     <filesystem provider="local" url="none" />
>>>>>     <workdirectory>.l</workdirectory>
>>>>>   </pool>
>>>>> --- then it loops to my localhost and just repeat submitting the jobs
>>>>>
>>>>> 1. Is this a correct setting?
>>>>> 2. Should I use coaster? I could not understand the description in
>>>>> user guides and documentation about the concepts of coaster and the
>>>>> required setting. Is there any better tutorial which would describe the
>>>>> coaster ?
>>>>> 3. I plan to use the swift later on the cloud (Microsoft Azure). What
>>>>> are the setting required for that? for site.xml and if any other file
>>>>>
>>>>>
>>>>> Thanks in Advance.
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Swift-user mailing list
>>>> Swift-user at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>>>
>>>
>>>
>>>
>>> --
>>> Yadu Nand B
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140902/eeb0f3fb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jobspernode-20140902-1401-nux8gdl0.log
Type: text/x-log
Size: 14003 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140902/eeb0f3fb/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p1-20140902-1406-he5yo1s3.log
Type: text/x-log
Size: 10845 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140902/eeb0f3fb/attachment-0001.bin>


More information about the Swift-user mailing list