[Swift-user] Coaster Service Startup Time on OSDC

Wilde, Michael J. wilde at anl.gov
Mon Feb 17 14:36:12 CST 2014


Good find, David. Did you file a ticket on the slowness with OSDC Support?

When you run ssh -vvv, does the timing of log output suggest where the problem is?

Can you run into some tool like typescript, or a "screen" log, that will timestamp the records, and send those to OSDC Support?

- Mike
--
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory                    The University of Chicago

________________________________
From: swift-user-bounces at ci.uchicago.edu [swift-user-bounces at ci.uchicago.edu] on behalf of David Kelly [davidkelly at uchicago.edu]
Sent: Monday, February 17, 2014 2:14 PM
To: Matthew Shaxted
Cc: swift-user at ci.uchicago.edu
Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC

Hi Matthew,

I set up a test on OSDC with 10 nodes. I did notice something strange there. When I try to SSH from the Sullivan head node to one of my CentOS instances, it takes much longer than it should:

dkelly at kg14-compute-1:~$ time ssh root at 172.16.1.236<mailto:root at 172.16.1.236> ls
Warning: Permanently added '172.16.1.236' (RSA) to the list of known hosts.
anaconda-ks.cfg
install.log
install.log.syslog

real 0m25.197s

Are you running start-coaster-service from the Sullivan head node? For each node in your node list, there are three SSH commands (one to create a directory structure, one to scp the worker.pl<http://worker.pl/> script there, and one to launch worker.pl<http://worker.pl/>). This is done serially in the 0.94 branch. I can see how this would take very long. I will make some changes to speed up start-coaster-service, but in the meantime, here are a few suggestions:

1. The SSH slowness seems to only be from the Sullivan head node to the VMs. SSH connections from one VM to another VM is pretty quick. Are you able to run Swift and start-coaster-service on a VM?

2. Is a persistent coasters setup needed here? OSDC has an option to launch instances as a cluster, which makes available a PBS scheduler. You could set this up and avoid the need to start and manage workers yourself.

Let me know what you think. I have some example OSDC/PBS configs if you decide to go that route.

Thanks,
David


On Thu, Feb 13, 2014 at 1:04 PM, Matthew Shaxted <Matthew.Shaxted at som.com<mailto:Matthew.Shaxted at som.com>> wrote:
Sure thing David,

I have a setup.sh script that exports WORKER_HOSTS IP addresses from a txt file (see attached) – all nodes are running centOS.

The conf file I am using is also attached. I’m using swift-0.94.1.

Many thanks,
Matthew


MATTHEW SHAXTED

SKIDMORE, OWINGS & MERRILL LLP
224 South Michigan Ave.
Chicago, IL 60604
TEL: 312.360.4368<tel:312.360.4368>
FAX: 312.360.4545<tel:312.360.4545>
matthew.shaxted at som.com<mailto:matthew.shaxted at som.com>

[cid:image9d6458.png at 2965c709.c87949ac]
WWW.SOM.COM<http://www.som.com/>

The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.

[cid:image93798a.gif at f078f826.ddd94773]

From: David Kelly [mailto:davidkelly at uchicago.edu<mailto:davidkelly at uchicago.edu>]
Sent: Thursday, February 13, 2014 12:53 PM
To: Matthew Shaxted
Cc: swift-user at ci.uchicago.edu<mailto:swift-user at ci.uchicago.edu>
Subject: Re: [Swift-user] Coaster Service Startup Time on OSDC

Hi Matthew,

Could you please explain a little more about how you're starting the coaster-service and workers? Are you using the start-coaster-service script? If you are, could you please send the coaster-service.conf file you're using? Which version of Swift is this?

There may be some things we can do to speed up the process - just need to get a better understanding of how things are set up and where the delays are coming from. Thanks!

Regards,
David

On Thu, Feb 13, 2014 at 12:23 PM, Matthew Shaxted <Matthew.Shaxted at som.com<mailto:Matthew.Shaxted at som.com>> wrote:
Dear Swift User Group:

I working with a 60-node cluster running on OSDC now, and when I try to start the Swift coaster-service for these nodes, it takes about 30 seconds (or more) per node to successfully start.

This is an issue for me as it limits how often I want to shut down the coaster-service - for this 60 node cluster it could take up to 30 min to start up again.

Is this starting coaster behavior normal? Is there anything I can do to make the coaster-service start faster?

Thanks,
Matthew


MATTHEW SHAXTED

SKIDMORE, OWINGS & MERRILL LLP
224 South Michigan Ave.
Chicago, IL 60604
TEL: 312.360.4368<tel:312.360.4368>
FAX: 312.360.4545<tel:312.360.4545>
matthew.shaxted at som.com<mailto:matthew.shaxted at som.com>

[cid:image9d6458.png at 2965c709.c87949ac]
WWW.SOM.COM<http://www.som.com/>

The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.

[cid:image93798a.gif at f078f826.ddd94773]


_______________________________________________
Swift-user mailing list
Swift-user at ci.uchicago.edu<mailto:Swift-user at ci.uchicago.edu>
https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140217/d2698966/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 3047 bytes
Desc: image002.png
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140217/d2698966/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 6643 bytes
Desc: image001.png
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140217/d2698966/attachment-0001.png>


More information about the Swift-user mailing list