[Swift-user] Remote SGE cluster

Igor Russo igor.souza.russo at gmail.com
Tue May 5 09:01:50 CDT 2015


Hi Mihael,

Sorry to bother you again.

You were right, after configuring the port forwarding the script is able to
connect.

But i still get an error "Checksum does not match".

Here goes the content of the ~/coaster-bootstrap-xxx.log file:

using plain mode
BS: http://189.12.232.9:50006
which: no gmd5sum in
(/opt/openmpi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bio/ncbi/bin:/opt/bio/mpiblast/bin:/opt/bio/EMBOSS/bin:/opt/bio/clustalw/bin:/opt/bio/tcoffee/bin:/opt/bio/hmmer/bin:/opt/bio/phylip/exe:/opt/bio/mrbayes:/opt/bio/fasta:/opt/bio/glimmer/bin:/opt/bio/glimmer/scripts:/opt/bio/gromacs/bin:/opt/bio/gmap/bin:/opt/bio/tigr/bin:/opt/bio/autodocksuite/bin:/opt/bio/wgs/bin:/opt/eclipse:/opt/ganglia/bin:/opt/ganglia/sbin:/usr/java/latest/bin:/opt/maven/bin:/opt/pdsh/bin:/opt/rocks/bin:/opt/rocks/sbin:/opt/condor/bin:/opt/condor/sbin:/opt/gridengine/bin/linux-x64)
Expected checksum: 9b7bd5a96a2912cf8d06d1a2fd891620
Computed checksum: 9b7bd5a96a2912cf8d06d1a2fd891620
JAVA=/usr/java/latest/bin/java
plain /usr/java/latest/bin/java -Djava=/usr/java/latest/bin/java -Xmx64M
-DGLOBUS_TCP_PORT_RANGE=
-DX509_USER_PROXY=/home/igor/.globus/sshproxy-1344874142-1432003400
-DX509_CERT_DIR=/home/igor/.globus/sshCAcert-1344874142-1432003400.pem
-DGLOBUS_HOSTNAME=cluster.mmc.ufjf.br -Duser.home=/home/igor -jar
/tmp/bootstrap.xTzo3v http://189.12.232.9:50006 https://189.12.232.9:50005
11100954039
Failed to download cog-provider-coaster-0.3.jar:
java.lang.RuntimeException: Checksum does not match.


Thanks,
Igor

2015-05-04 18:52 GMT-03:00 Mihael Hategan <hategan at mcs.anl.gov>:

>
> Hi,
>
> In most cases (globus, coasters), the service side (legion in this case)
> needs the ability to connect back to the client (your home connection).
>
> Correct me if I'm wrong, but you are on a DSL line, behind a router with
> NAT. If so, you must configure the router to forward some incoming
> connections to the actual machine from which you are running swift from.
> Typically this is done by configuring a certain port range forwarding on
> the router (Yadu suggested GLOBUS_TCP_PORT_RANGE=50000,51000, so that
> port range should be matched on the router).
>
> The gist of it is that swift starts a simple shell script on legion that
> downloads a small java app from the client side and launches it. Said
> shell script logs things into ~/coaster-bootstrap-xxx.log files. The
> contents of the bootstrap logs is probably very useful here.
>
> If all of that goes well, the aforementioned small java app downloads
> the full coaster service from the client and starts it. Once started,
> the coaster service connects back to Swift. The last two parts log their
> doings in ~/.globus/coasters/*.log. Those can be useful, too, if they
> exist.
>
> Mihael
>
> On Mon, 2015-05-04 at 18:27 -0300, Igor Russo wrote:
> > Hi Yadu,
> >
> > Yes, i can ssh from my laptop to the cluster directly.
> >
> > The coaster-bootstrap-*.log files are created in the remote system.
> >
> > I'm sending the log file attached.
> >
> > Thanks,
> > Igor
> >
> > 2015-05-04 16:57 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu>:
> >
> > >  Hi Igor,
> > >
> > > Are you able to ssh from your machine to legion directly without
> entering
> > > passwords ?
> > > Could you please send us a tarball of the runNNN directories for a
> failing
> > > run ?
> > >
> > > I've put the following settings in my ~/.ssh/config on my laptop and
> setup
> > > ssh keys on
> > > both socrates and legion. This allows me to use "ssh
> legion.rc.ucl.ac.uk"
> > > and connect.
> > >
> > > Host legion.rc.ucl.ac.uk
> > >     User YOUR_USERNAME
> > >     Hostname legion.rc.ucl.ac.uk
> > >     ProxyCommand ssh socrates -W %h:%p
> > >
> > > Host socrates
> > >     Hostname socrates.ucl.ac.uk
> > >     User YOUR_USERNAME
> > >     ForwardAgent yes
> > >
> > > Thanks,
> > > Yadu
> > >
> > >
> > >
> > > On 05/04/2015 07:51 AM, Igor Russo wrote:
> > >
> > > Hi Yadu,
> > >
> > >  Thanks again.
> > >
> > >  I tried your suggestion. Now i'm not getting the previous error, but
> the
> > > jobs aren't being submitted:
> > >
> > >  RunID: run001
> > > Progress: Seg, 04 Mai 2015 09:32:54-0300
> > > Progress: Seg, 04 Mai 2015 09:32:55-0300  Submitting:1
> > > Progress: Seg, 04 Mai 2015 09:33:25-0300  Submitting:1
> > > Progress: Seg, 04 Mai 2015 09:33:55-0300  Submitting:1
> > > Progress: Seg, 04 Mai 2015 09:34:25-0300  Submitting:1
> > > Progress: Seg, 04 Mai 2015 09:34:55-0300  Submitting:1
> > > Progress: Seg, 04 Mai 2015 09:35:25-0300  Submitting:1
> > > Progress: Seg, 04 Mai 2015 09:35:55-0300  Submitting:1
> > > Progress: Seg, 04 Mai 2015 09:36:25-0300  Submitting:1
> > >
> > >  In the the log file, i notice the following errors:
> > >
> > >  2015-05-04 09:24:06,223-0300 INFO  ServiceManager Service does not
> > > appear to be registered with this manager
> > > 2015-05-04 09:24:06,223-0300 INFO  ServiceManager Coaster service
> ended.
> > > Reason: null
> > >
> > >  Thanks,
> > >  Igor
> > >
> > >
> > > 2015-05-01 17:47 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu>:
> > >
> > >>  Hi Igor,
> > >>
> > >>  The remote connection system requires that the local machine you run
> > >> the swift client on has
> > >> a public ip address. It looks like swift was not able to guess it and
> set
> > >> it to http://igor-ubuntu:51251
> > >>
> > >>  Could you retry running part04 after doing the next step, and please
> > >> make sure your environment has
> > >> these variables set whenever you run swift to remote systems :
> > >>  export GLOBUS_HOSTNAME=<PUBLIC_IP_OF_YOUR_MACHINE>
> > >> export GLOBUS_TCP_PORT_RANGE=50000,51000
> > >>
> > >>  Thanks,
> > >> Yadu
> > >>
> > >>
> > >> On 05/01/2015 02:29 PM, Igor Russo wrote:
> > >>
> > >>  Hi Yadu,
> > >>
> > >>  Thank you very much!
> > >>
> > >>  I changed the config file with the data from my cluster.
> > >>
> > >>  When executing the 4th part of Swift-tutorial, i'm getting the
> > >> following error:
> > >> "Failed to download bootstrap jar from ..."
> > >>
> > >>
> > >>
> > >>
> --------------------------------------------------------------------------------
> > >>
> > >>  RunID: run031
> > >>  Progress: Sex, 01 Mai 2015 15:40:42-0300
> > >> Progress: Sex, 01 Mai 2015 15:40:43-0300  Submitting:1
> > >>
> > >>  Execution failed:
> > >> Exception in sort:
> > >>     Arguments: [-n, unsorted.txt]
> > >>     Host: mmc
> > >>     Directory: p4-run031/jobs/s/sort-go28d68m
> > >>  exception @ swift-int-staging.k, line: 165
> > >> Caused by:
> > >>  exception @ swift-int-staging.k, line: 160
> > >> Caused by: null
> > >> Caused by:
> > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could
> > >> not submit job
> > >> Caused by:
> > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could
> > >> not start coaster service
> > >> Caused by:
> > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Task
> > >> ended before registration was received.
> > >> Failed to download bootstrap jar from http://igor-ubuntu:51251
> > >>
> > >>  k:assign @ swift.k, line: 174
> > >> Caused by: Exception in sort:
> > >>     Arguments: [-n, unsorted.txt]
> > >>     Host: mmc
> > >>     Directory: p4-run031/jobs/s/sort-go28d68m
> > >>  exception @ swift-int-staging.k, line: 165
> > >> Caused by:
> > >>  exception @ swift-int-staging.k, line: 160
> > >> Caused by: null
> > >> Caused by:
> > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could
> > >> not submit job
> > >> Caused by:
> > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Could
> > >> not start coaster service
> > >> Caused by:
> > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> Task
> > >> ended before registration was received.
> > >> Failed to download bootstrap jar from http://igor-ubuntu:51251
> > >>
> > >>
> > >>
> --------------------------------------------------------------------------------
> > >>
> > >>  Thanks,
> > >> Igor
> > >>
> > >> 2015-05-01 13:47 GMT-03:00 Yadu Nand Babuji <yadunand at uchicago.edu>:
> > >>
> > >>>  Hi Igor,
> > >>>
> > >>> Swift does support SGE clusters, and you can refer to the
> swift-tutorial
> > >>> for sample code and configurations from this link:
> > >>> https://github.com/swift-lang/swift-tutorial
> > >>>
> > >>> Here's a sample config from our test-suite for Godzilla, an SGE
> cluster
> > >>> at UChicago:
> > >>>
> > >>>
> https://github.com/swift-lang/swift-k/blob/master/tests/sites/godzilla/swift.conf
> > >>> You could modify and add this config to the swift.conf file in the
> > >>> swift-tutorial to run
> > >>> Swift on any machine and execute on a remote SGE cluster.
> > >>>
> > >>> SGE is a widely used resource manager and most sites have
> differences in
> > >>> their setups that make each site unique. If you run into issues with
> the
> > >>> default
> > >>> swift package, and could provide help in figuring out specifics of
> your
> > >>> cluster, we
> > >>> will help you adapt the Swift SGE provider to support your cluster.
> > >>>
> > >>> Thanks,
> > >>> Yadu
> > >>>
> > >>>
> > >>>
> > >>> On 04/28/2015 05:09 PM, Igor Russo wrote:
> > >>>
> > >>>  Hi All,
> > >>>
> > >>>  It is possible to use Swift with a remote SGE/OGE cluster?
> > >>>
> > >>>  Regards,
> > >>> Igor
> > >>>
> > >>>
> > >>>  _______________________________________________
> > >>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://
> lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> Swift-user mailing list
> > >>> Swift-user at ci.uchicago.edu
> > >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > >>>
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://
> lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Swift-user mailing list
> > >> Swift-user at ci.uchicago.edu
> > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > >>
> > >
> > >
> > >
> > > _______________________________________________
> > > Swift-user mailing listSwift-user at ci.uchicago.eduhttps://
> lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > >
> > >
> > >
> > > _______________________________________________
> > > Swift-user mailing list
> > > Swift-user at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > >
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20150505/e0e63c83/attachment.html>


More information about the Swift-user mailing list