[Swift-user] Remote SGE cluster

Mihael Hategan hategan at mcs.anl.gov
Mon May 4 16:52:46 CDT 2015


Hi,

In most cases (globus, coasters), the service side (legion in this case)
needs the ability to connect back to the client (your home connection).

Correct me if I'm wrong, but you are on a DSL line, behind a router with
NAT. If so, you must configure the router to forward some incoming
connections to the actual machine from which you are running swift from.
Typically this is done by configuring a certain port range forwarding on
the router (Yadu suggested GLOBUS_TCP_PORT_RANGE=50000,51000, so that
port range should be matched on the router).

The gist of it is that swift starts a simple shell script on legion that
downloads a small java app from the client side and launches it. Said
shell script logs things into ~/coaster-bootstrap-xxx.log files. The
contents of the bootstrap logs is probably very useful here.

If all of that goes well, the aforementioned small java app downloads
the full coaster service from the client and starts it. Once started,
the coaster service connects back to Swift. The last two parts log their
doings in ~/.globus/coasters/*.log. Those can be useful, too, if they
exist.

Mihael

On Mon, 2015-05-04 at 18:27 -0300, Igor Russo wrote:
> Hi Yadu,
> 
> Yes, i can ssh from my laptop to the cluster directly.
> 
> The coaster-bootstrap-*.log files are created in the remote system.
> 
> I'm sending the log file attached.
> 
> Thanks,
> Igor
> 
> 2015-05-04 16:57 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu>:
> 
> >  Hi Igor,
> >
> > Are you able to ssh from your machine to legion directly without entering
> > passwords ?
> > Could you please send us a tarball of the runNNN directories for a failing
> > run ?
> >
> > I've put the following settings in my ~/.ssh/config on my laptop and setup
> > ssh keys on
> > both socrates and legion. This allows me to use "ssh legion.rc.ucl.ac.uk"
> > and connect.
> >
> > Host legion.rc.ucl.ac.uk
> >     User YOUR_USERNAME
> >     Hostname legion.rc.ucl.ac.uk
> >     ProxyCommand ssh socrates -W %h:%p
> >
> > Host socrates
> >     Hostname socrates.ucl.ac.uk
> >     User YOUR_USERNAME
> >     ForwardAgent yes
> >
> > Thanks,
> > Yadu
> >
> >
> >
> > On 05/04/2015 07:51 AM, Igor Russo wrote:
> >
> > Hi Yadu,
> >
> >  Thanks again.
> >
> >  I tried your suggestion. Now i'm not getting the previous error, but the
> > jobs aren't being submitted:
> >
> >  RunID: run001
> > Progress: Seg, 04 Mai 2015 09:32:54-0300
> > Progress: Seg, 04 Mai 2015 09:32:55-0300  Submitting:1
> > Progress: Seg, 04 Mai 2015 09:33:25-0300  Submitting:1
> > Progress: Seg, 04 Mai 2015 09:33:55-0300  Submitting:1
> > Progress: Seg, 04 Mai 2015 09:34:25-0300  Submitting:1
> > Progress: Seg, 04 Mai 2015 09:34:55-0300  Submitting:1
> > Progress: Seg, 04 Mai 2015 09:35:25-0300  Submitting:1
> > Progress: Seg, 04 Mai 2015 09:35:55-0300  Submitting:1
> > Progress: Seg, 04 Mai 2015 09:36:25-0300  Submitting:1
> >
> >  In the the log file, i notice the following errors:
> >
> >  2015-05-04 09:24:06,223-0300 INFO  ServiceManager Service does not
> > appear to be registered with this manager
> > 2015-05-04 09:24:06,223-0300 INFO  ServiceManager Coaster service ended.
> > Reason: null
> >
> >  Thanks,
> >  Igor
> >
> >
> > 2015-05-01 17:47 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu>:
> >
> >>  Hi Igor,
> >>
> >>  The remote connection system requires that the local machine you run
> >> the swift client on has
> >> a public ip address. It looks like swift was not able to guess it and set
> >> it to http://igor-ubuntu:51251
> >>
> >>  Could you retry running part04 after doing the next step, and please
> >> make sure your environment has
> >> these variables set whenever you run swift to remote systems :
> >>  export GLOBUS_HOSTNAME=<PUBLIC_IP_OF_YOUR_MACHINE>
> >> export GLOBUS_TCP_PORT_RANGE=50000,51000
> >>
> >>  Thanks,
> >> Yadu
> >>
> >>
> >> On 05/01/2015 02:29 PM, Igor Russo wrote:
> >>
> >>  Hi Yadu,
> >>
> >>  Thank you very much!
> >>
> >>  I changed the config file with the data from my cluster.
> >>
> >>  When executing the 4th part of Swift-tutorial, i'm getting the
> >> following error:
> >> "Failed to download bootstrap jar from ..."
> >>
> >>
> >>
> >> --------------------------------------------------------------------------------
> >>
> >>  RunID: run031
> >>  Progress: Sex, 01 Mai 2015 15:40:42-0300
> >> Progress: Sex, 01 Mai 2015 15:40:43-0300  Submitting:1
> >>
> >>  Execution failed:
> >> Exception in sort:
> >>     Arguments: [-n, unsorted.txt]
> >>     Host: mmc
> >>     Directory: p4-run031/jobs/s/sort-go28d68m
> >>  exception @ swift-int-staging.k, line: 165
> >> Caused by:
> >>  exception @ swift-int-staging.k, line: 160
> >> Caused by: null
> >> Caused by:
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
> >> not submit job
> >> Caused by:
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
> >> not start coaster service
> >> Caused by:
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task
> >> ended before registration was received.
> >> Failed to download bootstrap jar from http://igor-ubuntu:51251
> >>
> >>  k:assign @ swift.k, line: 174
> >> Caused by: Exception in sort:
> >>     Arguments: [-n, unsorted.txt]
> >>     Host: mmc
> >>     Directory: p4-run031/jobs/s/sort-go28d68m
> >>  exception @ swift-int-staging.k, line: 165
> >> Caused by:
> >>  exception @ swift-int-staging.k, line: 160
> >> Caused by: null
> >> Caused by:
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
> >> not submit job
> >> Caused by:
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
> >> not start coaster service
> >> Caused by:
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task
> >> ended before registration was received.
> >> Failed to download bootstrap jar from http://igor-ubuntu:51251
> >>
> >>
> >> --------------------------------------------------------------------------------
> >>
> >>  Thanks,
> >> Igor
> >>
> >> 2015-05-01 13:47 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu>:
> >>
> >>>  Hi Igor,
> >>>
> >>> Swift does support SGE clusters, and you can refer to the swift-tutorial
> >>> for sample code and configurations from this link:
> >>> https://github.com/swift-lang/swift-tutorial
> >>>
> >>> Here's a sample config from our test-suite for Godzilla, an SGE cluster
> >>> at UChicago:
> >>>
> >>> https://github.com/swift-lang/swift-k/blob/master/tests/sites/godzilla/swift.conf
> >>> You could modify and add this config to the swift.conf file in the
> >>> swift-tutorial to run
> >>> Swift on any machine and execute on a remote SGE cluster.
> >>>
> >>> SGE is a widely used resource manager and most sites have differences in
> >>> their setups that make each site unique. If you run into issues with the
> >>> default
> >>> swift package, and could provide help in figuring out specifics of your
> >>> cluster, we
> >>> will help you adapt the Swift SGE provider to support your cluster.
> >>>
> >>> Thanks,
> >>> Yadu
> >>>
> >>>
> >>>
> >>> On 04/28/2015 05:09 PM, Igor Russo wrote:
> >>>
> >>>  Hi All,
> >>>
> >>>  It is possible to use Swift with a remote SGE/OGE cluster?
> >>>
> >>>  Regards,
> >>> Igor
> >>>
> >>>
> >>>  _______________________________________________
> >>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Swift-user mailing list
> >>> Swift-user at ci.uchicago.edu
> >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >>>
> >>
> >>
> >>
> >> _______________________________________________
> >> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >>
> >>
> >>
> >> _______________________________________________
> >> Swift-user mailing list
> >> Swift-user at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >>
> >
> >
> >
> > _______________________________________________
> > Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >
> >
> >
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user






More information about the Swift-user mailing list