[Swift-user] Remote SGE cluster

Yadu Nand Babuji yadunand at uchicago.edu
Mon May 4 14:57:49 CDT 2015


Hi Igor,

Are you able to ssh from your machine to legion directly without 
entering passwords ?
Could you please send us a tarball of the runNNN directories for a 
failing run ?

I've put the following settings in my ~/.ssh/config on my laptop and 
setup ssh keys on
both socrates and legion. This allows me to use "ssh 
legion.rc.ucl.ac.uk" and connect.

Host legion.rc.ucl.ac.uk
     User YOUR_USERNAME
     Hostname legion.rc.ucl.ac.uk
     ProxyCommand ssh socrates -W %h:%p

Host socrates
     Hostname socrates.ucl.ac.uk
     User YOUR_USERNAME
     ForwardAgent yes

Thanks,
Yadu


On 05/04/2015 07:51 AM, Igor Russo wrote:
> Hi Yadu,
>
> Thanks again.
>
> I tried your suggestion. Now i'm not getting the previous error, but 
> the jobs aren't being submitted:
>
> RunID: run001
> Progress: Seg, 04 Mai 2015 09:32:54-0300
> Progress: Seg, 04 Mai 2015 09:32:55-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:33:25-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:33:55-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:34:25-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:34:55-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:35:25-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:35:55-0300  Submitting:1
> Progress: Seg, 04 Mai 2015 09:36:25-0300  Submitting:1
>
> In the the log file, i notice the following errors:
>
> 2015-05-04 09:24:06,223-0300 INFO  ServiceManager Service does not 
> appear to be registered with this manager
> 2015-05-04 09:24:06,223-0300 INFO  ServiceManager Coaster service 
> ended. Reason: null
>
> Thanks,
> Igor
>
>
> 2015-05-01 17:47 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu 
> <mailto:yadunand at uchicago.edu>>:
>
>     Hi Igor,
>
>     The remote connection system requires that the local machine you
>     run the swift client on has
>     a public ip address. It looks like swift was not able to guess it
>     and set it tohttp://igor-ubuntu:51251 <http://igor-ubuntu:51251/>
>
>     Could you retry running part04 after doing the next step, and
>     please make sure your environment has
>     these variables set whenever you run swift to remote systems :
>     export GLOBUS_HOSTNAME=<PUBLIC_IP_OF_YOUR_MACHINE>
>     export GLOBUS_TCP_PORT_RANGE=50000,51000
>
>     Thanks,
>     Yadu
>
>
>     On 05/01/2015 02:29 PM, Igor Russo wrote:
>>     Hi Yadu,
>>
>>     Thank you very much!
>>
>>     I changed the config file with the data from my cluster.
>>
>>     When executing the 4th part of Swift-tutorial, i'm getting the
>>     following error:
>>     "Failed to download bootstrap jar from ..."
>>
>>
>>     --------------------------------------------------------------------------------
>>
>>     RunID: run031
>>     Progress: Sex, 01 Mai 2015 15:40:42-0300
>>     Progress: Sex, 01 Mai 2015 15:40:43-0300  Submitting:1
>>
>>     Execution failed:
>>     Exception in sort:
>>         Arguments: [-n, unsorted.txt]
>>         Host: mmc
>>         Directory: p4-run031/jobs/s/sort-go28d68m
>>     exception @ swift-int-staging.k, line: 165
>>     Caused by:
>>     exception @ swift-int-staging.k, line: 160
>>     Caused by: null
>>     Caused by:
>>     org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>     Could not submit job
>>     Caused by:
>>     org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>     Could not start coaster service
>>     Caused by:
>>     org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>     Task ended before registration was received.
>>     Failed to download bootstrap jar from http://igor-ubuntu:51251
>>     <http://igor-ubuntu:51251/>
>>
>>     k:assign @ swift.k, line: 174
>>     Caused by: Exception in sort:
>>         Arguments: [-n, unsorted.txt]
>>         Host: mmc
>>         Directory: p4-run031/jobs/s/sort-go28d68m
>>     exception @ swift-int-staging.k, line: 165
>>     Caused by:
>>     exception @ swift-int-staging.k, line: 160
>>     Caused by: null
>>     Caused by:
>>     org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>     Could not submit job
>>     Caused by:
>>     org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>     Could not start coaster service
>>     Caused by:
>>     org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>     Task ended before registration was received.
>>     Failed to download bootstrap jar from http://igor-ubuntu:51251
>>     <http://igor-ubuntu:51251/>
>>
>>     --------------------------------------------------------------------------------
>>
>>     Thanks,
>>     Igor
>>
>>     2015-05-01 13:47 GMT-03:00 Yadu Nand Babuji
>>     <yadunand at uchicago.edu <mailto:yadunand at uchicago.edu>>:
>>
>>         Hi Igor,
>>
>>         Swift does support SGE clusters, and you can refer to the
>>         swift-tutorial
>>         for sample code and configurations from this link:
>>         https://github.com/swift-lang/swift-tutorial
>>
>>         Here's a sample config from our test-suite for Godzilla, an
>>         SGE cluster at UChicago:
>>         https://github.com/swift-lang/swift-k/blob/master/tests/sites/godzilla/swift.conf
>>         You could modify and add this config to the swift.conf file
>>         in the swift-tutorial to run
>>         Swift on any machine and execute on a remote SGE cluster.
>>
>>         SGE is a widely used resource manager and most sites have
>>         differences in
>>         their setups that make each site unique. If you run into
>>         issues with the default
>>         swift package, and could provide help in figuring out
>>         specifics of your cluster, we
>>         will help you adapt the Swift SGE provider to support your
>>         cluster.
>>
>>         Thanks,
>>         Yadu
>>
>>
>>
>>         On 04/28/2015 05:09 PM, Igor Russo wrote:
>>>         Hi All,
>>>
>>>         It is possible to use Swift with a remote SGE/OGE cluster?
>>>
>>>         Regards,
>>>         Igor
>>>
>>>
>>>         _______________________________________________
>>>         Swift-user mailing list
>>>         Swift-user at ci.uchicago.edu  <mailto:Swift-user at ci.uchicago.edu>
>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>>
>>         _______________________________________________
>>         Swift-user mailing list
>>         Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>>
>>
>>
>>     _______________________________________________
>>     Swift-user mailing list
>>     Swift-user at ci.uchicago.edu  <mailto:Swift-user at ci.uchicago.edu>
>>     https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>     _______________________________________________
>     Swift-user mailing list
>     Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
>     https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20150504/12daf8e7/attachment.html>


More information about the Swift-user mailing list