[Swift-user] Coaster Service Startup Time on OSDC

David Kelly davidkelly at uchicago.edu
Mon Feb 17 20:30:10 CST 2014


Hi Matthew,

That sounds good. I prefer the scheduler option too when I run there. Let
me know if there's anything that needs clarification when you get a chance
to test it.

There is a way to change the sites.xml file that gets generated by
start-coaster-service. You can add additional pools to the
etc/sites/persistent-coasters file that is contained in the Swift
installation directory.



On Mon, Feb 17, 2014 at 7:54 PM, Matthew Shaxted <Matthew.Shaxted at som.com>wrote:

> Thanks David,
>
>
>
> I'm going to test the pbs sites.xml approach and see how it works. Indeed
> the time to start coasters is really quite irritating, especially since my
> coasters are now not staying in a persistent state. I have a 12 hr job
> running on all my OSDC cores now (actually just requested more) so I will
> test it after this finishes.
>
>
>
> Now that I'm looking at my below sites.xml file though, it seems only my
> localhost pool is not 'coaster-persistent' so this is most likely what is
> causing the issue.
>
>
>
> On a slightly different note sites.xml file note, my start-coaster-service
> is rewriting the sites.xml file each time. Is there a simple way to prevent
> this from happening, or it does need to happen, can
> I at least add my localhost pool into the file automatically. My current
> OSDC sites file looks like below and I have been manually adding in the
> localhost pool everytime I restart the coasters:
>
>
>
> <config>
>
>
>   <pool
> handle="persistent-coasters">
>
>
>     <execution provider="coaster-persistent"
>
>
>                url="http://localhost:42860"
>
>
>                jobmanager="local:local"/>
>
>
>     <profile namespace="globus"
> key="workerManager">passive</profile>
>
>
>     <profile namespace="globus"
> key="jobsPerNode">1</profile>
>
>
>     <profile key="jobThrottle"
> namespace="karajan">1000</profile>
>
>
>     <profile namespace="karajan"
> key="initialScore">10000</profile>
>
>
>     <filesystem provider="local" url="none"
> />
>
>
>     <workdirectory>/tmp/mshaxted/swiftwork</workdirectory>
>
>
>   </pool>
>
>
>   <pool handle="localhost">
>
>
> <execution provider="coaster" jobmanager="local:local" url="
> http://localhost"/>
>
> <profile key="jobThrottle"
> namespace="karajan">.23</profile>
>
>
> <profile namespace="karajan"
> key="initialScore">10000</profile>
>
>
> <workdirectory>/tmp/mshaxted/swiftwork</workdirectory>
>
>
> </pool>
>
>
> </config>
>
>
>
>
>
>
>
> MATTHEW SHAXTED
>
>
>
> SKIDMORE, OWINGS & MERRILL LLP
>
> 224 South Michigan Ave.
>
> Chicago, IL 60604
>
> TEL: 312.360.4368
>
> FAX: 312.360.4545
>
> matthew.shaxted at som.com
>
>
>
> [image: cid:image9d6458.png at 2965c709.c87949ac]
>
> WWW.SOM.COM <http://www.som.com/>
>
>
>
> The information contained in this communication may be confidential, is
> intended only for the use of the recipient(s) named above, and may be
> legally privileged. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited and may be unlawful. If you have received this communication in
> error, please return it to the sender immediately and delete the original
> message and any copy of it from your computer system. If you have any
> questions concerning this message, please contact the sender.
>
>
>
> [image: cid:image93798a.gif at f078f826.ddd94773]
>
>
>
> *From:* David Kelly [mailto:davidkelly at uchicago.edu]
> *Sent:* Monday, February 17, 2014 7:21 PM
> *To:* Matthew Shaxted
> *Cc:* Wilde, Michael J.; swift-user at ci.uchicago.edu
>
> *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC
>
>
>
> Sounds good, Matthew. Let me know how that works for you. I filed a ticket
> with osdc support about the SSH issue, so hopefully they can offer some
> help there.
>
>
>
> I added an entry to our site guide documentation about how to run on OSDC
> in cluster mode. It is at
> http://swiftlang.org/guides/release-0.94/siteguide/siteguide.html#_open_science_data_grid.
> The only potential issue is, it seems to only work with the standard Ubuntu
> images.
>
>
>
> On Mon, Feb 17, 2014 at 3:34 PM, Matthew Shaxted <Matthew.Shaxted at som.com>
> wrote:
>
> Hi David,
>
>
>
> Indeed I am running start-coaster-service from the head node.
>
>
>
> The first recommendation is a good one, I will try this out and let you
> know how it works.
>
>
>
> I am also very interested in the cluster launch/PBS scheduler approach,
> although I have never used it before. An example OSDC/PBS config would be
> really helpful.
>
>
>
> Thanks,
>
> Matthew
>
>
>
>
>
> MATTHEW SHAXTED
>
>
>
> SKIDMORE, OWINGS & MERRILL LLP
>
> 224 South Michigan Ave.
>
> Chicago, IL 60604
>
> TEL: 312.360.4368
>
> FAX: 312.360.4545
>
> matthew.shaxted at som.com
>
>
>
> [image: cid:image9d6458.png at 2965c709.c87949ac]
>
> WWW.SOM.COM <http://www.som.com/>
>
>
>
> The information contained in this communication may be confidential, is
> intended only for the use of the recipient(s) named above, and may be
> legally privileged. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited and may be unlawful. If you have received this communication in
> error, please return it to the sender immediately and delete the original
> message and any copy of it from your computer system. If you have any
> questions concerning this message, please contact the sender.
>
>
>
> [image: cid:image93798a.gif at f078f826.ddd94773]
>
>
>
> *From:* Wilde, Michael J. [mailto:wilde at anl.gov]
> *Sent:* Monday, February 17, 2014 2:36 PM
> *To:* David Kelly; Matthew Shaxted
> *Cc:* swift-user at ci.uchicago.edu
> *Subject:* RE: [Swift-user] Coaster Service Startup Time on OSDC
>
>
>
> Good find, David. Did you file a ticket on the slowness with OSDC Support?
>
> When you run ssh -vvv, does the timing of log output suggest where the
> problem is?
>
> Can you run into some tool like typescript, or a "screen" log, that will
> timestamp the records, and send those to OSDC Support?
>
>
>
> - Mike
>
> --
>
> Michael Wilde
>
> Mathematics and Computer Science          Computation Institute
>
> Argonne National Laboratory                    The University of Chicago
>
>
> ------------------------------
>
> *From:* swift-user-bounces at ci.uchicago.edu [
> swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [
> davidkelly at uchicago.edu]
> *Sent:* Monday, February 17, 2014 2:14 PM
> *To:* Matthew Shaxted
> *Cc:* swift-user at ci.uchicago.edu
> *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC
>
> Hi Matthew,
>
>
>
> I set up a test on OSDC with 10 nodes. I did notice something strange
> there. When I try to SSH from the Sullivan head node to one of my CentOS
> instances, it takes much longer than it should:
>
>
>
> dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236 ls
> Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts.
> anaconda-ks.cfg
> install.log
> install.log.syslog
>
>
> real 0m25.197s
>
>
>
> Are you running start-coaster-service from the Sullivan head node? For
> each node in your node list, there are three SSH commands (one to create a
> directory structure, one to scp the worker.pl script there, and one to
> launch worker.pl). This is done serially in the 0.94 branch. I can see
> how this would take very long. I will make some changes to speed up
> start-coaster-service, but in the meantime, here are a few suggestions:
>
>
>
> 1. The SSH slowness seems to only be from the Sullivan head node to the
> VMs. SSH connections from one VM to another VM is pretty quick. Are you
> able to run Swift and start-coaster-service on a VM?
>
>
>
> 2. Is a persistent coasters setup needed here? OSDC has an option to
> launch instances as a cluster, which makes available a PBS scheduler. You
> could set this up and avoid the need to start and manage workers yourself.
>
>
>
> Let me know what you think. I have some example OSDC/PBS configs if you
> decide to go that route.
>
>
>
> Thanks,
>
> David
>
>
>
> On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted <Matthew.Shaxted at som.com>
> wrote:
>
> Sure thing David,
>
>
>
> I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt
> file (see attached) - all nodes are running centOS.
>
>
>
> The conf file I am using is also attached. I'm using swift-0.94.1.
>
>
>
> Many thanks,
>
> Matthew
>
>
>
>
>
> MATTHEW SHAXTED
>
>
>
> SKIDMORE, OWINGS & MERRILL LLP
>
> 224 South Michigan Ave.
>
> Chicago, IL 60604
>
> TEL: 312.360.4368
>
> FAX: 312.360.4545
>
> matthew.shaxted at som.com
>
>
>
> [image: cid:image9d6458.png at 2965c709.c87949ac]
>
> WWW.SOM.COM <http://www.som.com/>
>
>
>
> The information contained in this communication may be confidential, is
> intended only for the use of the recipient(s) named above, and may be
> legally privileged. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited and may be unlawful. If you have received this communication in
> error, please return it to the sender immediately and delete the original
> message and any copy of it from your computer system. If you have any
> questions concerning this message, please contact the sender.
>
>
>
> [image: cid:image93798a.gif at f078f826.ddd94773]
>
>
>
> *From:* David Kelly [mailto:davidkelly at uchicago.edu]
> *Sent:* Thursday, February 13, 2014 12:53 PM
> *To:* Matthew Shaxted
> *Cc:* swift-user at ci.uchicago.edu
> *Subject:* Re: [Swift-user] Coaster Service Startup Time on OSDC
>
>
>
> Hi Matthew,
>
>
>
> Could you please explain a little more about how you're starting the
> coaster-service and workers? Are you using the start-coaster-service
> script? If you are, could you please send the coaster-service.conf file
> you're using? Which version of Swift is this?
>
>
>
> There may be some things we can do to speed up the process - just need to
> get a better understanding of how things are set up and where the delays
> are coming from. Thanks!
>
>
>
> Regards,
>
> David
>
>
>
> On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted <Matthew.Shaxted at som.com>
> wrote:
>
> Dear Swift User Group:
>
>
>
> I working with a 60-node cluster running on OSDC now, and when I try to
> start the Swift coaster-service for these nodes, it takes about 30 seconds
> (or more) per node to successfully start.
>
>
>
> This is an issue for me as it limits how often I want to shut down the
> coaster-service - for this 60 node cluster it could take up to 30 min to
> start up again.
>
>
>
> Is this starting coaster behavior normal? Is there anything I can do to
> make the coaster-service start faster?
>
>
>
> Thanks,
> Matthew
>
>
>
>
>
> MATTHEW SHAXTED
>
>
>
> SKIDMORE, OWINGS & MERRILL LLP
>
> 224 South Michigan Ave.
>
> Chicago, IL 60604
>
> TEL: 312.360.4368
>
> FAX: 312.360.4545
>
> matthew.shaxted at som.com
>
>
>
> [image: cid:image9d6458.png at 2965c709.c87949ac]
>
> WWW.SOM.COM <http://www.som.com/>
>
>
>
> The information contained in this communication may be confidential, is
> intended only for the use of the recipient(s) named above, and may be
> legally privileged. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited and may be unlawful. If you have received this communication in
> error, please return it to the sender immediately and delete the original
> message and any copy of it from your computer system. If you have any
> questions concerning this message, please contact the sender.
>
>
>
> [image: cid:image93798a.gif at f078f826.ddd94773]
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140217/ee5984b7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 6643 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140217/ee5984b7/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 3047 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140217/ee5984b7/attachment-0001.png>


More information about the Swift-user mailing list