[Swift-user] Coaster Service Startup Time on OSDC

Wilde, Michael J. wilde at anl.gov
Mon Feb 17 20:09:00 CST 2014


Maybe we can set up an ssh master-channel to each node once, and then it will go faster?

Maybe replace the node image with a different OS?

- Mike
--
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory                    The University of Chicago

________________________________
From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of Matthew Shaxted [Matthew.Shaxted at som.com]
Sent: Monday, February 17, 2014 7:54 PM
To: David Kelly
Cc: Wilde, Michael J.; swift-user at ci.uchicago.edu
Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC

Thanks David,

I’m going to test the pbs sites.xml approach and see how it works. Indeed the time to start coasters is really quite irritating, especially since my coasters are now not staying in a persistent state. I have a 12 hr job running on all my OSDC cores now (actually just requested more) so I will test it after this finishes.

Now that I’m looking at my below sites.xml file though, it seems only my localhost pool is not ‘coaster-persistent’ so this is most likely what is causing the issue.

On a slightly different note sites.xml file note, my start-coaster-service is rewriting the sites.xml file each time. Is there a simple way to prevent this from happening, or it does need to happen, can
I at least add my localhost pool into the file automatically. My current OSDC sites file looks like below and I have been manually adding in the localhost pool everytime I restart the coasters:

<config>
  <pool handle="persistent-coasters">
    <execution provider="coaster-persistent"
               url="http://localhost:42860"
               jobmanager="local:local"/>
    <profile namespace="globus" key="workerManager">passive</profile>
    <profile namespace="globus" key="jobsPerNode">1</profile>
    <profile key="jobThrottle" namespace="karajan">1000</profile>
    <profile namespace="karajan" key="initialScore">10000</profile>
    <filesystem provider="local" url="none" />
    <workdirectory>/tmp/mshaxted/swiftwork</workdirectory>
  </pool>
  <pool handle="localhost">
<execution provider="coaster" jobmanager="local:local" url="http://localhost"/>
<profile key="jobThrottle" namespace="karajan">.23</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<workdirectory>/tmp/mshaxted/swiftwork</workdirectory>
</pool>
</config>



MATTHEW SHAXTED

SKIDMORE, OWINGS & MERRILL LLP
224 South Michigan Ave.
Chicago, IL 60604
TEL: 312.360.4368
FAX: 312.360.4545
matthew.shaxted at som.com<mailto:matthew.shaxted at som.com>

[cid:image9d6458.png at 2965c709.c87949ac]
WWW.SOM.COM<http://www.som.com/>

The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.

[cid:image93798a.gif at f078f826.ddd94773]

From: David Kelly [mailto:davidkelly at uchicago.edu]
Sent: Monday, February 17, 2014 7:21 PM
To: Matthew Shaxted
Cc: Wilde, Michael J.; swift-user at ci.uchicago.edu
Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC

Sounds good, Matthew. Let me know how that works for you. I filed a ticket with osdc support about the SSH issue, so hopefully they can offer some help there.

I added an entry to our site guide documentation about how to run on OSDC in cluster mode. It is at http://swiftlang.org/guides/release-0.94/siteguide/siteguide.html#_open_science_data_grid. The only potential issue is, it seems to only work with the standard Ubuntu images.

On Mon, Feb 17, 2014 at 3:34 PM, Matthew Shaxted <Matthew.Shaxted at som.com<mailto:Matthew.Shaxted at som.com>> wrote:
Hi David,

Indeed I am running start-coaster-service from the head node.

The first recommendation is a good one, I will try this out and let you know how it works.

I am also very interested in the cluster launch/PBS scheduler approach, although I have never used it before. An example OSDC/PBS config would be really helpful.

Thanks,
Matthew


MATTHEW SHAXTED

SKIDMORE, OWINGS & MERRILL LLP
224 South Michigan Ave.
Chicago, IL 60604
TEL: 312.360.4368<tel:312.360.4368>
FAX: 312.360.4545<tel:312.360.4545>
matthew.shaxted at som.com<mailto:matthew.shaxted at som.com>

[cid:image9d6458.png at 2965c709.c87949ac]
WWW.SOM.COM<http://www.som.com/>

The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.

[cid:image93798a.gif at f078f826.ddd94773]

From: Wilde, Michael J. [mailto:wilde at anl.gov<mailto:wilde at anl.gov>]
Sent: Monday, February 17, 2014 2:36 PM
To: David Kelly; Matthew Shaxted
Cc: swift-user at ci.uchicago.edu<mailto:swift-user at ci.uchicago.edu>
Subject: RE: [Swift-user] Coaster Service Startup Time on OSDC

Good find, David. Did you file a ticket on the slowness with OSDC Support?

When you run ssh -vvv, does the timing of log output suggest where the problem is?

Can you run into some tool like typescript, or a "screen" log, that will timestamp the records, and send those to OSDC Support?

- Mike
--
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory                    The University of Chicago

________________________________
From: swift-user-bounces at ci.uchicago.edu<mailto:swift-user-bounces at ci.uchicago.edu> [swift-user-bounces at ci.uchicago.edu<mailto:swift-user-bounces at ci.uchicago.edu>] on behalf of David Kelly [davidkelly at uchicago.edu<mailto:davidkelly at uchicago.edu>]
Sent: Monday, February 17, 2014 2:14 PM
To: Matthew Shaxted
Cc: swift-user at ci.uchicago.edu<mailto:swift-user at ci.uchicago.edu>
Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC
Hi Matthew,

I set up a test on OSDC with 10 nodes. I did notice something strange there. When I try to SSH from the Sullivan head node to one of my CentOS instances, it takes much longer than it should:

dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236<mailto:root at 172.16.1.236> ls
Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts.
anaconda-ks.cfg
install.log
install.log.syslog

real 0m25.197s

Are you running start-coaster-service from the Sullivan head node? For each node in your node list, there are three SSH commands (one to create a directory structure, one to scp the worker.pl<http://worker.pl/> script there, and one to launch worker.pl<http://worker.pl/>). This is done serially in the 0.94 branch. I can see how this would take very long. I will make some changes to speed up start-coaster-service, but in the meantime, here are a few suggestions:

1. The SSH slowness seems to only be from the Sullivan head node to the VMs. SSH connections from one VM to another VM is pretty quick. Are you able to run Swift and start-coaster-service on a VM?

2. Is a persistent coasters setup needed here? OSDC has an option to launch instances as a cluster, which makes available a PBS scheduler. You could set this up and avoid the need to start and manage workers yourself.

Let me know what you think. I have some example OSDC/PBS configs if you decide to go that route.

Thanks,
David

On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted <Matthew.Shaxted at som.com<mailto:Matthew.Shaxted at som.com>> wrote:
Sure thing David,

I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt file (see attached) – all nodes are running centOS.

The conf file I am using is also attached. I’m using swift-0.94.1.

Many thanks,
Matthew


MATTHEW SHAXTED

SKIDMORE, OWINGS & MERRILL LLP
224 South Michigan Ave.
Chicago, IL 60604
TEL: 312.360.4368<tel:312.360.4368>
FAX: 312.360.4545<tel:312.360.4545>
matthew.shaxted at som.com<mailto:matthew.shaxted at som.com>

[cid:image9d6458.png at 2965c709.c87949ac]
WWW.SOM.COM<http://www.som.com/>

The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.

[cid:image93798a.gif at f078f826.ddd94773]

From: David Kelly [mailto:davidkelly at uchicago.edu<mailto:davidkelly at uchicago.edu>]
Sent: Thursday, February 13, 2014 12:53 PM
To: Matthew Shaxted
Cc: swift-user at ci.uchicago.edu<mailto:swift-user at ci.uchicago.edu>
Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC

Hi Matthew,

Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this?

There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks!

Regards,
David

On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted <Matthew.Shaxted at som.com<mailto:Matthew.Shaxted at som.com>> wrote:
Dear Swift User Group:

I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start.

This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again.

Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster?

Thanks,
Matthew


MATTHEW SHAXTED

SKIDMORE, OWINGS & MERRILL LLP
224 South Michigan Ave.
Chicago, IL 60604
TEL: 312.360.4368<tel:312.360.4368>
FAX: 312.360.4545<tel:312.360.4545>
matthew.shaxted at som.com<mailto:matthew.shaxted at som.com>

[cid:image9d6458.png at 2965c709.c87949ac]
WWW.SOM.COM<http://www.som.com/>

The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.

[cid:image93798a.gif at f078f826.ddd94773]


_______________________________________________
Swift-user mailing list
Swift-user at ci.uchicago.edu<mailto:Swift-user at ci.uchicago.edu>
https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140218/f3b24e2f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 6643 bytes
Desc: image001.png
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140218/f3b24e2f/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 3047 bytes
Desc: image002.png
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140218/f3b24e2f/attachment-0001.png>


More information about the Swift-user mailing list