[Swift-user] Remote SGE cluster

Igor Russo igor.souza.russo at gmail.com
Mon May 4 16:27:56 CDT 2015


Hi Yadu,

Yes, i can ssh from my laptop to the cluster directly.

The coaster-bootstrap-*.log files are created in the remote system.

I'm sending the log file attached.

Thanks,
Igor

2015-05-04 16:57 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu>:

>  Hi Igor,
>
> Are you able to ssh from your machine to legion directly without entering
> passwords ?
> Could you please send us a tarball of the runNNN directories for a failing
> run ?
>
> I've put the following settings in my ~/.ssh/config on my laptop and setup
> ssh keys on
> both socrates and legion. This allows me to use "ssh legion.rc.ucl.ac.uk"
> and connect.
>
> Host legion.rc.ucl.ac.uk
>     User YOUR_USERNAME
>     Hostname legion.rc.ucl.ac.uk
>     ProxyCommand ssh socrates -W %h:%p
>
> Host socrates
>     Hostname socrates.ucl.ac.uk
>     User YOUR_USERNAME
>     ForwardAgent yes
>
> Thanks,
> Yadu
>
>
>
> On 05/04/2015 07:51 AM, Igor Russo wrote:
>
> Hi Yadu,
>
>  Thanks again.
>
>  I tried your suggestion. Now i'm not getting the previous error, but the
> jobs aren't being submitted:
>
>  RunID: run001
> Progress: Seg, 04 Mai 2015 09:32:54-0300
> Progress: Seg, 04 Mai 2015 09:32:55-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:33:25-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:33:55-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:34:25-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:34:55-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:35:25-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:35:55-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:36:25-0300  Submitting:1
>
>  In the the log file, i notice the following errors:
>
>  2015-05-04 09:24:06,223-0300 INFO  ServiceManager Service does not
> appear to be registered with this manager
> 2015-05-04 09:24:06,223-0300 INFO  ServiceManager Coaster service ended.
> Reason: null
>
>  Thanks,
>  Igor
>
>
> 2015-05-01 17:47 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu>:
>
>>  Hi Igor,
>>
>>  The remote connection system requires that the local machine you run
>> the swift client on has
>> a public ip address. It looks like swift was not able to guess it and set
>> it to http://igor-ubuntu:51251
>>
>>  Could you retry running part04 after doing the next step, and please
>> make sure your environment has
>> these variables set whenever you run swift to remote systems :
>>  export GLOBUS_HOSTNAME=<PUBLIC_IP_OF_YOUR_MACHINE>
>> export GLOBUS_TCP_PORT_RANGE=50000,51000
>>
>>  Thanks,
>> Yadu
>>
>>
>> On 05/01/2015 02:29 PM, Igor Russo wrote:
>>
>>  Hi Yadu,
>>
>>  Thank you very much!
>>
>>  I changed the config file with the data from my cluster.
>>
>>  When executing the 4th part of Swift-tutorial, i'm getting the
>> following error:
>> "Failed to download bootstrap jar from ..."
>>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>  RunID: run031
>>  Progress: Sex, 01 Mai 2015 15:40:42-0300
>> Progress: Sex, 01 Mai 2015 15:40:43-0300  Submitting:1
>>
>>  Execution failed:
>> Exception in sort:
>>     Arguments: [-n, unsorted.txt]
>>     Host: mmc
>>     Directory: p4-run031/jobs/s/sort-go28d68m
>>  exception @ swift-int-staging.k, line: 165
>> Caused by:
>>  exception @ swift-int-staging.k, line: 160
>> Caused by: null
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
>> not submit job
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
>> not start coaster service
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task
>> ended before registration was received.
>> Failed to download bootstrap jar from http://igor-ubuntu:51251
>>
>>  k:assign @ swift.k, line: 174
>> Caused by: Exception in sort:
>>     Arguments: [-n, unsorted.txt]
>>     Host: mmc
>>     Directory: p4-run031/jobs/s/sort-go28d68m
>>  exception @ swift-int-staging.k, line: 165
>> Caused by:
>>  exception @ swift-int-staging.k, line: 160
>> Caused by: null
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
>> not submit job
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
>> not start coaster service
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task
>> ended before registration was received.
>> Failed to download bootstrap jar from http://igor-ubuntu:51251
>>
>>
>> --------------------------------------------------------------------------------
>>
>>  Thanks,
>> Igor
>>
>> 2015-05-01 13:47 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu>:
>>
>>>  Hi Igor,
>>>
>>> Swift does support SGE clusters, and you can refer to the swift-tutorial
>>> for sample code and configurations from this link:
>>> https://github.com/swift-lang/swift-tutorial
>>>
>>> Here's a sample config from our test-suite for Godzilla, an SGE cluster
>>> at UChicago:
>>>
>>> https://github.com/swift-lang/swift-k/blob/master/tests/sites/godzilla/swift.conf
>>> You could modify and add this config to the swift.conf file in the
>>> swift-tutorial to run
>>> Swift on any machine and execute on a remote SGE cluster.
>>>
>>> SGE is a widely used resource manager and most sites have differences in
>>> their setups that make each site unique. If you run into issues with the
>>> default
>>> swift package, and could provide help in figuring out specifics of your
>>> cluster, we
>>> will help you adapt the Swift SGE provider to support your cluster.
>>>
>>> Thanks,
>>> Yadu
>>>
>>>
>>>
>>> On 04/28/2015 05:09 PM, Igor Russo wrote:
>>>
>>>  Hi All,
>>>
>>>  It is possible to use Swift with a remote SGE/OGE cluster?
>>>
>>>  Regards,
>>> Igor
>>>
>>>
>>>  _______________________________________________
>>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>>
>>>
>>>
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>>
>>
>>
>>
>> _______________________________________________
>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>>
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>
>
>
> _______________________________________________
> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20150504/0c66386a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run001.tar.gz
Type: application/x-gzip
Size: 5680 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20150504/0c66386a/attachment.bin>


More information about the Swift-user mailing list