[Swift-user] Coaster Service Startup Time on OSDC

David Kelly davidkelly at uchicago.edu
Mon Feb 17 19:21:06 CST 2014


Sounds good, Matthew. Let me know how that works for you. I filed a ticket
with osdc support about the SSH issue, so hopefully they can offer some
help there.

I added an entry to our site guide documentation about how to run on OSDC
in cluster mode. It is at
http://swiftlang.org/guides/release-0.94/siteguide/siteguide.html#_open_science_data_grid.
The only potential issue is, it seems to only work with the standard Ubuntu
images.


On Mon, Feb 17, 2014 at 3:34 PM, Matthew Shaxted <Matthew.Shaxted at som.com>wrote:

> Hi David,
>
>
>
> Indeed I am running start-coaster-service from the head node.
>
>
>
> The first recommendation is a good one, I will try this out and let you
> know how it works.
>
>
>
> I am also very interested in the cluster launch/PBS scheduler approach,
> although I have never used it before. An example OSDC/PBS config would be
> really helpful.
>
>
>
> Thanks,
>
> Matthew
>
>
>
>
>
> MATTHEW SHAXTED
>
>
>
> SKIDMORE, OWINGS & MERRILL LLP
>
> 224 South Michigan Ave.
>
> Chicago, IL 60604
>
> TEL: 312.360.4368
>
> FAX: 312.360.4545
>
> matthew.shaxted at som.com
>
>
>
> [image: cid:image9d6458.png at 2965c709.c87949ac]
>
> WWW.SOM.COM <http://www.som.com/>
>
>
>
> The information contained in this communication may be confidential, is
> intended only for the use of the recipient(s) named above, and may be
> legally privileged. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited and may be unlawful. If you have received this communication in
> error, please return it to the sender immediately and delete the original
> message and any copy of it from your computer system. If you have any
> questions concerning this message, please contact the sender.
>
>
>
> [image: cid:image93798a.gif at f078f826.ddd94773]
>
>
>
> *From:* Wilde, Michael J. [mailto:wilde at anl.gov]
> *Sent:* Monday, February 17, 2014 2:36 PM
> *To:* David Kelly; Matthew Shaxted
> *Cc:* swift-user at ci.uchicago.edu
> *Subject:* RE: [Swift-user] Coaster Service Startup Time on OSDC
>
>
>
> Good find, David. Did you file a ticket on the slowness with OSDC Support?
>
> When you run ssh -vvv, does the timing of log output suggest where the
> problem is?
>
> Can you run into some tool like typescript, or a "screen" log, that will
> timestamp the records, and send those to OSDC Support?
>
>
>
> - Mike
>
> --
>
> Michael Wilde
>
> Mathematics and Computer Science          Computation Institute
>
> Argonne National Laboratory                    The University of Chicago
>
>
> ------------------------------
>
> *From:* swift-user-bounces at ci.uchicago.edu [
> swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [
> davidkelly at uchicago.edu]
> *Sent:* Monday, February 17, 2014 2:14 PM
> *To:* Matthew Shaxted
> *Cc:* swift-user at ci.uchicago.edu
> *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC
>
> Hi Matthew,
>
>
>
> I set up a test on OSDC with 10 nodes. I did notice something strange
> there. When I try to SSH from the Sullivan head node to one of my CentOS
> instances, it takes much longer than it should:
>
>
>
> dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls
> Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts.
> anaconda-ks.cfg
> install.log
> install.log.syslog
>
>
> real 0m25.197s
>
>
>
> Are you running start-coaster-service from the Sullivan head node? For
> each node in your node list, there are three SSH commands (one to create a
> directory structure, one to scp the worker.pl script there, and one to
> launch worker.pl). This is done serially in the 0.94 branch. I can see
> how this would take very long. I will make some changes to speed up
> start-coaster-service, but in the meantime, here are a few suggestions:
>
>
>
> 1. The SSH slowness seems to only be from the Sullivan head node to the
> VMs. SSH connections from one VM to another VM is pretty quick. Are you
> able to run Swift and start-coaster-service on a VM?
>
>
>
> 2. Is a persistent coasters setup needed here? OSDC has an option to
> launch instances as a cluster, which makes available a PBS scheduler. You
> could set this up and avoid the need to start and manage workers yourself.
>
>
>
> Let me know what you think. I have some example OSDC/PBS configs if you
> decide to go that route.
>
>
>
> Thanks,
>
> David
>
>
>
> On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted <Matthew.Shaxted at som.com>
> wrote:
>
> Sure thing David,
>
>
>
> I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt
> file (see attached) - all nodes are running centOS.
>
>
>
> The conf file I am using is also attached. I'm using swift-0.94.1.
>
>
>
> Many thanks,
>
> Matthew
>
>
>
>
>
> MATTHEW SHAXTED
>
>
>
> SKIDMORE, OWINGS & MERRILL LLP
>
> 224 South Michigan Ave.
>
> Chicago, IL 60604
>
> TEL: 312.360.4368
>
> FAX: 312.360.4545
>
> matthew.shaxted at som.com
>
>
>
> [image: cid:image9d6458.png at 2965c709.c87949ac]
>
> WWW.SOM.COM <http://www.som.com/>
>
>
>
> The information contained in this communication may be confidential, is
> intended only for the use of the recipient(s) named above, and may be
> legally privileged. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited and may be unlawful. If you have received this communication in
> error, please return it to the sender immediately and delete the original
> message and any copy of it from your computer system. If you have any
> questions concerning this message, please contact the sender.
>
>
>
> [image: cid:image93798a.gif at f078f826.ddd94773]
>
>
>
> *From:* David Kelly [mailto:davidkelly at uchicago.edu]
> *Sent:* Thursday, February 13, 2014 12:53 PM
> *To:* Matthew Shaxted
> *Cc:* swift-user at ci.uchicago.edu
> *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC
>
>
>
> Hi Matthew,
>
>
>
> Could you please explain a little more about how you're starting the
> coaster-service and workers? Are you using the start-coaster-service
> script? If you are, could you please send the coaster-service.conf file
> you're using? Which version of Swift is this?
>
>
>
> There may be some things we can do to speed up the process - just need to
> get a better understanding of how things are set up and where the delays
> are coming from. Thanks!
>
>
>
> Regards,
>
> David
>
>
>
> On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted <Matthew.Shaxted at som.com>
> wrote:
>
> Dear Swift User Group:
>
>
>
> I working with a 60-node cluster running on OSDC now, and when I try to
> start the Swift coaster-service for these nodes, it takes about 30 seconds
> (or more) per node to successfully start.
>
>
>
> This is an issue for me as it limits how often I want to shut down the
> coaster-service - for this 60 node cluster it could take up to 30 min to
> start up again.
>
>
>
> Is this starting coaster behavior normal? Is there anything I can do to
> make the coaster-service start faster?
>
>
>
> Thanks,
> Matthew
>
>
>
>
>
> MATTHEW SHAXTED
>
>
>
> SKIDMORE, OWINGS & MERRILL LLP
>
> 224 South Michigan Ave.
>
> Chicago, IL 60604
>
> TEL: 312.360.4368
>
> FAX: 312.360.4545
>
> matthew.shaxted at som.com
>
>
>
> [image: cid:image9d6458.png at 2965c709.c87949ac]
>
> WWW.SOM.COM <http://www.som.com/>
>
>
>
> The information contained in this communication may be confidential, is
> intended only for the use of the recipient(s) named above, and may be
> legally privileged. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited and may be unlawful. If you have received this communication in
> error, please return it to the sender immediately and delete the original
> message and any copy of it from your computer system. If you have any
> questions concerning this message, please contact the sender.
>
>
>
> [image: cid:image93798a.gif at f078f826.ddd94773]
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140217/a107973b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 6643 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140217/a107973b/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 3047 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140217/a107973b/attachment-0001.png>


More information about the Swift-user mailing list