[Swift-user] Remote SGE cluster
Yadu Nand Babuji
yadunand at uchicago.edu
Mon May 4 14:57:49 CDT 2015
Hi Igor,
Are you able to ssh from your machine to legion directly without
entering passwords ?
Could you please send us a tarball of the runNNN directories for a
failing run ?
I've put the following settings in my ~/.ssh/config on my laptop and
setup ssh keys on
both socrates and legion. This allows me to use "ssh
legion.rc.ucl.ac.uk" and connect.
Host legion.rc.ucl.ac.uk
User YOUR_USERNAME
Hostname legion.rc.ucl.ac.uk
ProxyCommand ssh socrates -W %h:%p
Host socrates
Hostname socrates.ucl.ac.uk
User YOUR_USERNAME
ForwardAgent yes
Thanks,
Yadu
On 05/04/2015 07:51 AM, Igor Russo wrote:
> Hi Yadu,
>
> Thanks again.
>
> I tried your suggestion. Now i'm not getting the previous error, but
> the jobs aren't being submitted:
>
> RunID: run001
> Progress: Seg, 04 Mai 2015 09:32:54-0300
> Progress: Seg, 04 Mai 2015 09:32:55-0300 Submitting:1
> Progress: Seg, 04 Mai 2015 09:33:25-0300 Submitting:1
> Progress: Seg, 04 Mai 2015 09:33:55-0300 Submitting:1
> Progress: Seg, 04 Mai 2015 09:34:25-0300 Submitting:1
> Progress: Seg, 04 Mai 2015 09:34:55-0300 Submitting:1
> Progress: Seg, 04 Mai 2015 09:35:25-0300 Submitting:1
> Progress: Seg, 04 Mai 2015 09:35:55-0300 Submitting:1
> Progress: Seg, 04 Mai 2015 09:36:25-0300 Submitting:1
>
> In the the log file, i notice the following errors:
>
> 2015-05-04 09:24:06,223-0300 INFO ServiceManager Service does not
> appear to be registered with this manager
> 2015-05-04 09:24:06,223-0300 INFO ServiceManager Coaster service
> ended. Reason: null
>
> Thanks,
> Igor
>
>
> 2015-05-01 17:47 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu
> <mailto:yadunand at uchicago.edu>>:
>
> Hi Igor,
>
> The remote connection system requires that the local machine you
> run the swift client on has
> a public ip address. It looks like swift was not able to guess it
> and set it tohttp://igor-ubuntu:51251 <http://igor-ubuntu:51251/>
>
> Could you retry running part04 after doing the next step, and
> please make sure your environment has
> these variables set whenever you run swift to remote systems :
> export GLOBUS_HOSTNAME=<PUBLIC_IP_OF_YOUR_MACHINE>
> export GLOBUS_TCP_PORT_RANGE=50000,51000
>
> Thanks,
> Yadu
>
>
> On 05/01/2015 02:29 PM, Igor Russo wrote:
>> Hi Yadu,
>>
>> Thank you very much!
>>
>> I changed the config file with the data from my cluster.
>>
>> When executing the 4th part of Swift-tutorial, i'm getting the
>> following error:
>> "Failed to download bootstrap jar from ..."
>>
>>
>> --------------------------------------------------------------------------------
>>
>> RunID: run031
>> Progress: Sex, 01 Mai 2015 15:40:42-0300
>> Progress: Sex, 01 Mai 2015 15:40:43-0300 Submitting:1
>>
>> Execution failed:
>> Exception in sort:
>> Arguments: [-n, unsorted.txt]
>> Host: mmc
>> Directory: p4-run031/jobs/s/sort-go28d68m
>> exception @ swift-int-staging.k, line: 165
>> Caused by:
>> exception @ swift-int-staging.k, line: 160
>> Caused by: null
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Could not submit job
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Could not start coaster service
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Task ended before registration was received.
>> Failed to download bootstrap jar from http://igor-ubuntu:51251
>> <http://igor-ubuntu:51251/>
>>
>> k:assign @ swift.k, line: 174
>> Caused by: Exception in sort:
>> Arguments: [-n, unsorted.txt]
>> Host: mmc
>> Directory: p4-run031/jobs/s/sort-go28d68m
>> exception @ swift-int-staging.k, line: 165
>> Caused by:
>> exception @ swift-int-staging.k, line: 160
>> Caused by: null
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Could not submit job
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Could not start coaster service
>> Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Task ended before registration was received.
>> Failed to download bootstrap jar from http://igor-ubuntu:51251
>> <http://igor-ubuntu:51251/>
>>
>> --------------------------------------------------------------------------------
>>
>> Thanks,
>> Igor
>>
>> 2015-05-01 13:47 GMT-03:00 Yadu Nand Babuji
>> <yadunand at uchicago.edu <mailto:yadunand at uchicago.edu>>:
>>
>> Hi Igor,
>>
>> Swift does support SGE clusters, and you can refer to the
>> swift-tutorial
>> for sample code and configurations from this link:
>> https://github.com/swift-lang/swift-tutorial
>>
>> Here's a sample config from our test-suite for Godzilla, an
>> SGE cluster at UChicago:
>> https://github.com/swift-lang/swift-k/blob/master/tests/sites/godzilla/swift.conf
>> You could modify and add this config to the swift.conf file
>> in the swift-tutorial to run
>> Swift on any machine and execute on a remote SGE cluster.
>>
>> SGE is a widely used resource manager and most sites have
>> differences in
>> their setups that make each site unique. If you run into
>> issues with the default
>> swift package, and could provide help in figuring out
>> specifics of your cluster, we
>> will help you adapt the Swift SGE provider to support your
>> cluster.
>>
>> Thanks,
>> Yadu
>>
>>
>>
>> On 04/28/2015 05:09 PM, Igor Russo wrote:
>>> Hi All,
>>>
>>> It is possible to use Swift with a remote SGE/OGE cluster?
>>>
>>> Regards,
>>> Igor
>>>
>>>
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>>
>>
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20150504/12daf8e7/attachment.html>
More information about the Swift-user
mailing list