[Swift-commit] r7982 - trunk/docs/userguide
hategan at ci.uchicago.edu
hategan at ci.uchicago.edu
Thu Jul 10 19:45:04 CDT 2014
Author: hategan
Date: 2014-07-10 19:45:00 -0500 (Thu, 10 Jul 2014)
New Revision: 7982
Added:
trunk/docs/userguide/configuration.new
trunk/docs/userguide/configuration.old
Modified:
trunk/docs/userguide/userguide.txt
Log:
added new configuration documentation
Added: trunk/docs/userguide/configuration.new
===================================================================
--- trunk/docs/userguide/configuration.new (rev 0)
+++ trunk/docs/userguide/configuration.new 2014-07-11 00:45:00 UTC (rev 7982)
@@ -0,0 +1,642 @@
+Configuration
+-------------
+
+Swift is mainly configured using a configuration file, typically called *swift.conf*.
+This file contains configuration properties and site descriptions. A simple
+configuration file may look like this:
+
+-----
+# include default Swift configuration file
+include "${swift.home}/etc/swift.conf"
+
+site.mysite {
+ execution {
+ type: "coaster"
+ URL: "my.site.org"
+ jobManager: "ssh:local"
+ }
+ staging: "proxy"
+
+ app.ALL {executable: "*"}
+}
+
+# select sites to run on
+sites: [mysite]
+
+# other settings
+lazy.errors: false
+-----
+
+Configuration Syntax
+~~~~~~~~~~~~~~~~~~~~
+
+The Swift configuration files are expressed in a modified version of JSON. The main
+additions to JSON are:
+
+- Quotes around string values, in particular keys, are optional, unless the strings
+contain special characters (single/double quotes, square and curly braces, white space,
++$+, +:+, +=+, +,+, +`+, +^+, +?+, +!+, + at +, +*+, +\+), or if they
+represent other values: +true+, +false+, +null+, and numbers.
+- +=+ and +:+ can be used interchangeably to separate keys from values
+- +=+ (or +:+) is optional before an open bracket
+- Commas are optional as separators if there is a new line
+- +${...}+ expansion can be used to substitute environment variable values or Java
+ system properties. If the value of an environment variable is needed, it must be
+ prefixed with +env.+. For example +${env.PATH}+. Except for include directives, the
+ +${...}+ must not be inside double quotes for the substitution to work. The same
+ outcome can be achieved using implicit string concatenation: +"/home/"${env.USER}"/bin"+
+
+Comments can be introduced by starting a line with a hash symbol (+#+) or using
+a double slash (+//+):
+
+-----
+# This is a comment
+// This is also a comment
+
+keepSitesDir: true # This is a comment following a valid property
+-----
+
+Include Directives
+~~~~~~~~~~~~~~~~~~
+
+Include directives can be used to include the contents of a Swift configuration file
+from another Swift configuration file. This is done using the literal +include+ followed
+by a quoted string containing the path to the target file. The path may contain
+references to environment variables or system properties using the substitution
+syntax explained above. For example:
+
+-----
+# an absolute path name
+include "/home/joedoe/swift-config/site1.conf"
+
+# include a file from the Swift distribution package
+include "${swift.home}/etc/sites/beagle.conf"
+
+# include a file using an environment variable
+include "${env.SWIFT_CONFIG_DIR}/glow.conf"
+-----
+
+Configuration File Structure
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The contents of a Swift configuration file can be divided into a number of relevant
+sections:
+
+- site declarations
+- global application declarations
+- Swift configuration properties
+
+Site Delcarations
+^^^^^^^^^^^^^^^^^
+
+Swift site declarations are specified using the +site.<name>+ property, where text
+inside angle brackets is to be interpreted as a generic label for user-specified
+content, whereas content between square brackets is optional:
+
+-----
+site.<name> {
+ execution {...}
+ [staging: "swift" | "local" | "service-local" | "shared-fs"
+ | "proxy" | "wrapper"]
+ [filesystem {...}]
+
+ [<site options>]
+ [<application declarations>]
+}
+-----
+
+A site name can be any string. If the string contains special characters, it must be
+quoted:
+
+-----
+site."My-$pecial-$ite" {...}
+-----
+
+Execution Mechanism
++++++++++++++++++++
+
+The +execution+ property tells Swift how applications should be executed on a site:
+
+-----
+ execution {
+ type: <string>
+ [URL: <string>]
+ [jobManager: <string>]
+
+ [<execution provider options>]
+ }
+-----
+
+The +type+ property is used to select one of the mechanisms for application execution
+that is known by Swift. A comprehensinve list of execution mechanisms can be found
+in <<??, ??>>. A summary is shown below:
+
+[[table-execution-mechanisms]]
+.Swift Execution Mechanisms
+[options="header",cols="3,2,2,2,4,10"]
+|========================================================================================================================
+|Type |URL required|Uses jobManager|Default jobManager|Staging methods supported |
+Description
+
+|+local+ | no | no | - | swift, local, wrapper |
+Runs applications locally using a simple fork()-based mechanism
+
+|+coaster+ | yes | yes | none | swift, wrapper, service-local, shared-fs, proxy |
+Submits applications through an automatically-deployed Swift Coasters service
+
+|+coaster-persistent+ | yes | yes | none | swift, wrapper, service-local, shared-fs, proxy |
+Uses a manually deployed Swift Coasters service
+
+|+GRAM5+ | yes | yes | "fork" | swift, wrapper |
+Uses the <<http://toolkit.globus.org/toolkit/docs/latest-stable/gram5/user/#gram5User,GRAM: User's Guide>> component of
+the Globus Toolkit.
+
+|+GT2+ 4+| |
+An alias for 'GRAM5'
+
+|+SSH+ | yes | no | - | swift, wrapper |
+Runs applications using a Java implementation of the 'SSH' protocol
+
+|+SSH-CL+ | yes | no | - | swift, wrapper |
+Like 'SSH' except it uses the command-line 'ssh' tool.
+
+|+PBS+ | no | no | - | swift, wrapper |
+Submits applications to a PBS or Torque resource manager
+
+|+Condor+ | no | no | - | swift, wrapper |
+Submits applications using Condor
+
+|+SGE+ | no | no | - | swift, wrapper |
+Uses the Sun Grid Engine
+
+|+SLURM+ | no | no | - | swift, wrapper |
+Uses the SLURM local scheduler
+
+|+LSF+ | no | no | - | swift, wrapper |
+Submits applications to Platform's Load Sharing Facility
+
+|========================================================================================================================
+
+The execution provider +options+ are options that specify finer details on how
+on application should be executed. They depend on the chosen mechanism and are detailed in
+<<??, ??>>. This is where Coasters options, such as +nodeGranularity+ or +softImage+, would
+be specified. Example:
+
+-----
+execution {
+ type: "coaster"
+ jobManager: "local:local"
+ options {
+ maxJobs: 1
+ tasksPerNode: 2
+ workerLoggingLevel: TRACE
+ }
+}
+-----
+
+A complete list of Swift Coasters options can be found in <<??,??>>
+
+Staging
++++++++
+
+The staging property instructs Swift how to handle application input and output files.
+The 'swift' and 'wrapper' staging methods are supported universally, but the 'swift' method
+requires the +filesystem+ property to be specified. If not specified, this option defaults to
+'swift'. Support for the other choices is dependent on the execution mechanism. This is
+detailed in the <<table-execution-mechanisms,Execution Mechanisms Table>> above. A
+description of each staging method is provided in the table below:
+
+[[table-staging-methods]]
+.Swift Staging Methods
+[options="header",cols="3, 10"]
+|=============================================================================================
+| Staging Method | Description
+| +swift+ | This method instructs Swift to use a filesystem provider to direct all
+ necessary staging operations from the Swift client-side to the cluster
+ head node. If this method is used, the +workDirectory+ must point to
+ a head node path that is on a shared file system accessible by the
+ compute nodes.
+| +wrapper+ | File staging is done by the Swift application wrapper
+| +local+ | Used by non-remote type execution mechanisms. It implements simple file
+ staging by copying files.
+| +service-local+ | This method instructs the execution mechanism provider to stage input and
+ output files from the remote site where the execution service is located.
+ For example, if a Coaster Service is started on the login node of a
+ cluster, the Coaster Service will perform the staging from a file system
+ on the login node to the compute node and back.
+| +shared-fs+ | This method is used by Coasters to implement a simple staging mechanism in
+ which files are accessed using a shared filesystem that is accessible by
+ compute nodes
+| +proxy+ | This method is also used by Coasters to stage files from/to the Swift
+ client side to compute nodes by proxying through the Coaster Service.
+|==============================================================================================
+
+
+File System
++++++++++++
+
+The file system properties are used with +staging: "swift"+ to tell Swift how to access remote
+file systems. Valid types are described below:
+
+[[table-filesystem-providers]]
+.Swift File System Providers
+[options="header",cols="3, 3, 10"]
+|==========================================================================
+| Type | URL required | Description
+| +local+ | no | Copies files locally on the Swift client side
+| +GSIFTP+ | yes | Accesses a remote file system using GridFTP
+| +GridFTP+ | yes | An alias for +GSIFTP+
+| +SSH+ | yes | Uses the SCP protocol
+|==========================================================================
+
+
+
+Site Options
+++++++++++++
+
+Site options control various aspects of how Swift handles application execution on a site.
+All options except +workDirectory+ are optional. They are listed in the following table:
+
+
+[[table-site-options]]
+.Site Options
+[options="header",cols="1, 1, 1, 10"]
+|=================================
+| Option | Valid values | Default value |
+Description
+| +OS+ | many |"INTEL32::LINUX" |
+Can be used to tell Swift the type of the operating system
+running on the remote site. By default, Swift assumes a
+UNIX/Linux type OS. There is some limited support for
+running under Windows, in which case this property must be
+set to one of +"INTEL32::WINDOWS"+ or +"INTEL64::WINDOWS"+
+
+| +workDirectory+ | path | - |
+Points to a directory in which Swift can maintain a set of
+files relevant to the execution of an application on the
+site. By default, applications will be executed on the
+compute nodes in a sub-directory of +workDirectory+, which
+implies that +workDirectory+ must be accessible from the
+compute nodes.
+
+| +scratch+ | path | - |
+If specified, it instructs swift to run applications in
+a directory different than +workDirectory+. Contrary to the
+requirement for +workDirectory+, +scratch+ can point to
+a file system local to compute nodes. This option is useful
+if applications do intensive I/O on temporary files created
+in their work directory, or if they access their input/output
+files in a non-linear fashion.
+
+| +keepSiteDir+ | +true, false+ | +false+ |
+If set to +true+, site application directories (i.e. +workDirectory+)
+will not be cleaned up when Swift completes a run. This
+can be useful for debugging.
+
+| +statusMode+ | +"files", "provider"+ | +"files"+|
+Controls whether application exit codes are handled by the
+execution mechanism or passed back to Swift by the Swift
+wrapper script through files. Traditionally, Globus GRAM
+did not use to return application exit codes. This has
+changed in Globus Toolkit 5.x. However, some local scheduler
+execution mechanisms, such as 'PBS', are still unable to
+return application exit codes. In such cases, it is necessary
+to pass the application exit codes back to Swift in files.
+This comes at a slight price in performance, since a file
+needs to be created, written to, and transferred back to
+Swift for each application invocation. It is however also
+the default, since it works in all cases.
+
+| +maxParallelTasks+ | integer | 2 |
+The maximum number of concurrent application invocations
+allowed on this site.
+
+| +initialParallelTasks+ | integer | 2 |
+The limit on the number of concurrent application invocations
+on this site when a Swift run is started. As invocations
+complete successfully, the number of concurrent invocations
+on the site is increased up to +maxParallelTasks+.
+|=================================
+
+Additional, less frequently used options, are as follows:
+
+[[table-site-options-obscure]]
+.Obscure options that you are unlikely to need to worry about
+[options="header",cols="1, 1, 1, 10"]
+|=================================
+| Option | Valid values | Default value |
+Description
+
+| +wrapperParameterMode+ | +"args", "files"+ | +"args"+ |
+If set to +"files"+, Swift will, as much as possible, pass
+application arguments through files. The applications will
+be invoked normally, with their arguments in the +**argv+
+parameter to the +main()+ function. This can be useful if the
+execution mechanism has limitations on the size of command
+line arguments that can be passed through. An example of
+execution mechanism exhibiting this problem is Condor.
+
+| +wrapperInterpreter+ | path | +"/bin/bash"+ or +"cscript.exe"+ on Windows |
+Points to the interpreter used to run the Swift application
+invocation wrapper
+
+| +wrapperScript+ | string | +"_swiftwrap"+ or +"_swiftwrap.vbs"+ on Windows |
+Points to the Swift application invocation wrapper. The file
+must exist in the 'libexec' directory in the Swift distribution
+
+| +wrapperInterpreterOptions+ | list of strings | +[]+ on UNIX/Linux or +["//Nologo"]+ on Windows |
+Command line options to be passed to the wrapper interpreter
+
+| +cleanupCommand+ | string | +"/bin/rm"+ or +"cmd.exe"+ on Windows |
+A command to use for the cleaning of site directories (unless
++keepSiteDir+ is set to +true+) at the end of a run.
+
+| +cleanupCommandOptions+ | list of strings | +["-rf"]+ or +["/C", "del", "/Q"]+ on Windows |
+Arguments to pass to the cleanup command when cleaning up site
+work directories
+
+| +delayBase+ | number | 2.0 |
+Swift keeps a quality indicator for each site it runs applications
+on. This is a number that gets increased for every successful
+application invocation, and decreased for every failure. It then
+uses this number in deciding which sites to run applications on
+(when multiple sites are defined). If this number becomes very
+low (a sign of repeated failures on a site), Swift implements
+an exponential back-off that prevents jobs from being sent to a
+site that continously fails them. +delayBase+ is the base for
+that exponential back-off:
+ +delay = delayBase ^ (-score * 100)+
+
+| +maxSubmitRate+ | positive number| - |
+Some combinations of site and execution mechanisms may become
+error prone if jobs are submitted too fast. This option can
+be used to limit the submission rate. If set to some number N,
+Swift will submit applications at a rate of at most N
+per second.
+
+|=================================
+
+
+Application Declarations
+++++++++++++++++++++++++
+
+Applications can either be declared globally, outside of a site declaration,
+or specific to a site, inside a site declaration:
+
+------
+app.(<appName>|ALL) {
+ # global application
+ ...
+}
+
+site.<siteName> {
+ app.(<appName>|ALL) {
+ # site application
+ ...
+ }
+}
+------
+
+A special application name, +ALL+, can be used to declare options for all
+applications. When Swift attempts to run an application named +X+, it will
+first look at site application declarations for +app.X+. If not found, it will
+check if a site application declaration exists for +app.ALL+. The search will
+continue with the global +app.X+ and then the global +all.ALL+ until a match
+is found. It is possible that a specific application will only be declared
+on a sub-set of all the sites and not globally. Swift will then only select
+a site where the application is declared and will not attempt to run the
+application on other sites.
+
+An application declaration takes the following form:
+
+-----
+app.<appName> {
+ executable: (<string>|"*")
+ [jobQueue: <string>]
+ [jobProject: <string>]
+ [maxWallTime: <time>]
+ [options: {...}]
+ <environment variables>
+}
+-----
+
+The +executable+ is mandatory, and it points to the actual location of the
+executable that implements the application. The special string +"*"+ can
+be used to indicate that the executable has the same name as the application
+name. This is useful in conjunction with +app.ALL+ to essentially declare
+that a site can be used to execute any application from a Swift script. If the
+executable is not an absolute path, it will be searched using the +PATH+
+envirnoment variable on the remote site.
+
+Environment variables can be defined as follows:
+
+-----
+ env.<name>: <value>
+-----
+
+For example:
+
+-----
+ env.LD_LIBRARY_PATH: "/home/joedoe/lib"
+-----
+
+The remaining options are:
+
+[[table-site-options]]
+.Application Options
+[options="header",cols="3, 3, 10"]
+|====================================================================
+| Name | Valid values |
+Description
+
+| +jobQueue+ | any |
+If the application is executed using a mechanism that submits to
+a queuing system, this option can be used to select a specific
+queue for the application
+
+| +jobProject+ | any |
+A queuing system project to associate the job with.
+
+| +maxWallTime+| +"mm"+ or +"hh:mm"+ or +"hh:mm:ss"+ |
+The maximum amount of time that the application will take to execute
+on the site. Most application execution mechanisms will both require
+and enforce this value by terminating the application if it exceeds
+the specified time. The default value is 10 minutes.
+
+|====================================================================
+
+
+General Swift Options
++++++++++++++++++++++
+
+There are a number of configuration options that modify the way that
+the Swift run-time behaves. They are listed below:
+
+[[table-swift-options]]
+.General Swift Options
+[options="header",cols="3, 3, 3, 10"]
+|====================================================================
+| Name | Valid values | Default value |
+Description
+
+| +sites+ | array of strings (i.e. +["site1", "site2"]+) | none |
+Selects, out of the set of all declared sites, a sub-set of sites to
+run applications on. This option can also be supplied on the Swift command line,
+in which case it shoud be a single string with comma-separated items
+(e.g. +swift -sites site1,site2 ...+)
+
+| +hostName+ | string | autodetected |
+Can be used to specify a publicly reacheable DNS name or IP address for this
+machine which is generally used for Globus or Coaster callbacks. Normally this should be
+auto-detected, but if you do not have a public DNS name, you may want to set this.
+
+| +TCPPortRange+ | +"lowPort, highPort"+ | none |
+A TCP port range can be specified to restrict the ports on which certain callback
+services are started. This is likely needed if your submit host is behind a firewall,
+in which case the firewall should be configured to allow incoming connections on
+ports in this range.
+
+| +lazyErrors+ | +true, false+ | +false+ |
+Use a lazy mode to deal with errors. When set to 'true' Swift will proceed with the
+execution until no more data can be derived because of errors in dependent steps. If
+set to 'false', an error will cause the execution to immediately stop
+
+| +executionRetries+ | non-negative integer | +0+ |
+The number of time an application invocation will be retries if it fails until Swift
+finally gives up and declares it failed. The total number of attempts will be ++1 +
+executionRetries++.
+
+| +logProvenance+ | +true, false+ | +false+ |
+If set to +true+, Swift will record provenance information in the log file.
+
+| +alwaysTransferWrapperLog+ | +true, false+| +false+ |
+Controls when wrapper logs are transfered back to the submit host. If set to
++false+, Swift will only transfer a wrapper log for a given job when that job fails.
+If set to +true+, Swift will transfer wrapper logs whether a job fails or not.
+
+| +fileGCEnabled+ | +true, false+ | +true+ |
+Controls the file garbage collector. If set to +false+, files mapped by
+collectable mappers (such as the concurrent mapper) will not be deleted when their
+Swift variables go out of scope.
+
+| +mappingCheckerEnabled+ | +true, false+ | +true+ |
+Controls the run-time duplicate mapping checker (which indetifies mapping
+conflicts). When enabled, a record of all mapped data is kept, so this comes at the
+expense of a slight memory leak. If set +false+, the mapping checker is disabled.
+
+| +tracingEnabled+ | +true, false+ | +false+ |
+Enables execution tracing. If set to +true+, operations within Swift such as
+iterations, invocations, assignments, and declarations, as well as data dependencies
+will be logged. This comes at a cost in performance. It is therefore disabled by
+default.
+
+| +maxForeachThreads+ | positive integer| +16384+ |
+Limits the number of concurrent iterations that each 'foreach' statement
+can have at one time. This conserves memory for swift programs that
+have large numbers of iterations (which would otherwise all be executed
+in parallel).
+
+4+| *Ticker*
+
+| +tickerEnabled+ | +true, false+ | +true+ |
+Controls the output ticker, which regularly prints information about the counts
+of application states on the Swift's process standard output
+
+| +tickerPrefix+ | string | +"Progress: "+|
+Specifies a string to prefix to each ticker line output
+
+| +tickerDateFormat+ | string | +"E, dd MMM yyyy HH:mm:ssZ"+|
+Specifies the date/time format to use for the time stamp of each ticker
+line. It must conform to Java's
+<<http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html,SimpleDateFormat>>
+syntax.
+
+4+| *CDM*
+| +CDMBroadcastMode+ | string | +"file"+ |
+-
+| +CDMLogFile+ | string | +"cdm.log"+ |
+-
+
+4+| *Replication*
+
+|+replicationEnabled+| +true, false+ | +false+ |
+If enabled, jobs that are queued longer than a certain amount of time will
+have a duplicate version re-submitted. This process will continue until a
+maximum pre-set number of such replicas is queued. When one of the replicas
+becomes active, all other replicas are canceled. This mechanism can potentially
+prevent a single overloaded site from completely blocking a run.
+
+|+replicationMinQueueTime+| seconds | +60+ |
+When replication is enabled, this is the amount of time that a job needs to
+be queued until a new replica is created.
+
+|+replicationLimit+ | +integer > 0+ | +3+ |
+The maximum number of replicas allowed for a given application instance.
+
+4+| *Wrapper Staging*
+|+wrapperStagingLocalServer+| string | +"file://"+ |
+When file staging is set to +"wrapper"+, this indicates the default URL
+scheme that is prefixed to local files.
+
+4+| *Throttling*
+| +jobSubmitThrottle+ | +integer > 0+ or +"off"+ | +4+ |
+Limits the number of jobs that can concurrently be in the process of being
+submitted, that is in the "Submitting" state. This is the state where the job
+information is being communicated to a remote service. Certain execution
+mechanisms may become inefficient if too many jobs are being submitted
+concurrently and there are no benefits to parallelizing submission beyond a
+certain point. Please not that this does not apply to the number of jobs that
+can be active concurrently.
+
+| +hostJobSubmitThrottle+ | +integer > 0+ or +"off"+ | +2+ |
+Like +jobSubmitThrottle+, except it applies to each individual site.
+
+| +fileTransfersThrottle+ | +integer > 0+ or +"off"+ | +4+ |
+Limits the number of concurrent file transfers when file staging is set to
++"swift"+. Arbitrarily increasing file transfer parallelism leads to little
+benefits as the throughput approaches the maximum avaiable network bandwidth.
+Instead it can lead to an increase in latencies which may increase the chances
+of triggering timeouts.
+
+| +fileOperationsThrottle+| +integer > 0+ or +"off"+ | +8+ |
+Limits the number of concurrent file operations that can be active at a given
+time when file staging is set to +"swift"+. File operations are defined to be all
+remote operations on a filesystem that exclude file transfers. Examples are:
+listing the contents of a directory, creating a directory, removing a file, etc.
+
+
+4+| *Global versions of site options*
+
+| +staging+ | +"swift", "local", "service-local", "shared-fs", "proxy", "wrapper"+ | +"swift"+ |
+See <<table-staging-methods,Staging Methods>>.
+| +keepSiteDir+ | +true, false+ | +false+ |
+See <<table-site-options,Site Options>>.
+
+| +statusMode+ | +"files", "provider"+ | +"files"+ |
+See <<table-site-options,Site Options>>.
+
+| +wrapperParameterMode+ | +"args", "files"+| +"args"+ |
+See <<table-site-options-obscure,Other Site Options>>.
+
+
+|====================================================================
+
+
+Run directories
+~~~~~~~~~~~~~~~
+When you run Swift, you will see a run directory get created. The run
+directory has the name of runNNN, where NNN starts at 000 and increments for
+every run.
+
+The run directories can be useful for debugging. They contain:
+.Run directory contents
+|======================
+|apps |An apps generated from swift.properties
+|cf |A configuration file generated from swift.properties
+|runNNN.log|The log file generated during the Swift run
+|scriptname-runNNN.d|Debug directory containing wrapper logs
+|scripts|Directory that contains scheduler scripts used for that run
+|sites.xml|A sites.xml generated from swift.properties
+|swift.out|The standard out and standard error generated by Swift
+|======================
+
Copied: trunk/docs/userguide/configuration.old (from rev 7744, trunk/docs/userguide/configuration)
===================================================================
--- trunk/docs/userguide/configuration.old (rev 0)
+++ trunk/docs/userguide/configuration.old 2014-07-11 00:45:00 UTC (rev 7982)
@@ -0,0 +1,542 @@
+Configuration
+-------------
+
+Swift uses a single configuration file called swift.properties. The swift.properties
+file is responsible for:
+
+1. Defining how to interface with schedulers
+2. Defining app names and locations
+3. Defining various other swift settings and behavior
+
+Here is an example swift.properties file.
+
+-----
+# Define a site named sandyb
+site.sandyb {
+ tasksPerWorker=16
+ taskWalltime=00:05:00
+ jobManager=slurm
+ jobQueue=sandyb
+ maxJobs=1
+ workdir=/scratch/midway/$USER/work
+ filesystem=local
+}
+
+# Define sandyb apps
+app.sandyb.echo=/bin/echo
+
+# Define other swift properties
+sitedir.keep=true
+wrapperlog.always.transfer=true
+
+# Select which site to run on
+site=sandyb
+-----
+
+The details of this file will be explained more later. Let's first look
+at an example of running Swift. Using the swift.properties the new Swift
+command a user would run is:
+
+-----
+$ swift script.swift
+-----
+
+That is all that is needed. Everything Swift needs to know is defined in
+swift.properties.
+
+Location of swift.properties
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Swift searches for swift.properties files in multiple locations:
+
+1. The etc/swift.properties file included with the Swift distribution.
+2. $SWIFT_SITE_CONF/swift.properties - used for defining site templates.
+3. $HOME/.swift/swift.properties
+4. swift.properties in your current directory.
+5. Any property file you point to with the command line argument "-properties
+<file>"
+
+Settings get read in this order. Definitions in the later files will override
+any previous definitions. For example, if you have execution.retries=10 in
+$HOME/.swift/swift.properties, and execution.retries=0 in the swift.properties
+in your current directory, execution.retries will be set to 0.
+
+To verify what files are being read, and what values will be set, run:
+-----
+$ swift -listconfig
+-----
+
+Selecting a site
+~~~~~~~~~~~~~~~~
+There are two ways Swift knows where to run. The first is via
+swift.properties. The site command specified which site entries
+should be used for a particular run.
+
+-----
+site=sandyb
+-----
+
+Sites can also be selected on the command line by using the -site option.
+
+-----
+$ swift -site westmere script.swift
+-----
+
+The -site command line argument will override any sites selected in
+swift.properties.
+
+Selecting multiple sites
+~~~~~~~~~~~~~~~~~~~~~~~~
+To use multiple sites, use a list of site names separated by commas. In
+swift.properties:
+
+-----
+site=westmere,sandyb
+-----
+
+The same format can be used on the command line:
+
+-----
+$ swift -site westmere,sandyb script.swift
+-----
+
+NOTE: You can also use "sites=" in swift.properties, and "-sites x,y,z" on the
+command line.
+
+Run directories
+~~~~~~~~~~~~~~~
+When you run Swift, you will see a run directory get created. The run
+directory has the name of runNNN, where NNN starts at 000 and increments for
+every run.
+
+The run directories can be useful for debugging. They contain:
+.Run directory contents
+|======================
+|apps |An apps generated from swift.properties
+|cf |A configuration file generated from swift.properties
+|runNNN.log|The log file generated during the Swift run
+|scriptname-runNNN.d|Debug directory containing wrapper logs
+|scripts|Directory that contains scheduler scripts used for that run
+|sites.xml|A sites.xml generated from swift.properties
+|swift.out|The standard out and standard error generated by Swift
+|======================
+
+Using site templates
+~~~~~~~~~~~~~~~~~~~~
+Swift recognizes an environmnet variable called $SWIFT_SITE_CONF, which points to
+a directory containing a swift.properties file. This swift.properties can contain multiple
+site definitions for the various queues available on the cluster you are using.
+
+Your local swift.properties then does not need to define the entire site. It
+may contain only differences you need to make that are specific to your
+application, like walltime.
+
+Backward compatability
+~~~~~~~~~~~~~~~~~~~~~~~
+New users are encouraged to use the configuration mechanisms described in this documentation.
+However, if you are migrating from an older Swift release to 0.95, the older-style configurations
+using sites.xml and tc.data should still work. If you notice an instance where this is not true,
+please send an email to swift-support at ci.uchicago.edu.
+
+The swift.properties file format
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Site definitions
+^^^^^^^^^^^^^^^^
+Site definitions in the swift.properties files begin with "site".
+
+The second word is the name of the site you are defining. In these examples we
+will define a site called westmere.
+
+The third word is the property.
+
+For example:
+-----
+site.westmere.jobQueue=fast
+-----
+
+Before the site properties are listed, it's important to understand the
+terminology used.
+
+A *task*, or *app task* is an instance of a program as defined in
+a Swift app() function.
+
+A *worker* is the program that launches app tasks.
+
+A *job* is related to schedulers. It is the mechanism by which workers
+are launched.
+
+Below is the list of valid site properties with brief explanations of what
+they do, and an example swift.properties entry.
+
+.swift.properties site properties
+[options="header"]
+|================================
+|Property|Description|Example
+
+|condor|
+Pass parameters directly through to the submit script generated for the condor
+scheduler. For example, the setting "site.osgconnect.condor.+projectname=Swift"
+will generate the line "+projectname = Swift".
+|site.osgconnect.condor.+projectname=Swift
+
+|filesystem|
+Defines how files should be accessed
+|site.westmere.filesystem=local
+
+|jobGranularity|
+Specifies the granularity of a job, in nodes
+|site.westmere.jobGranularity=2
+
+|jobManager|
+Specifies how jobs will be launched. The supported job managers are
+"cobalt", "slurm", "condor", "pbs", "lsf", "local", and "sge".
+|site.westmere.jobManager=slurm
+
+|jobProject|
+Set the project name for the job scheduler
+|site.westmere.project=myproject
+
+|jobQueue|
+Set the name of the scheduler queue to use.
+|site.westmere.jobQueue=westmere
+
+|jobWalltime|
+The maximum number amount of time allocated in a scheduler job, in hh:mm:ss
+format.
+|site.westmere.jobWalltime=01:00:00
+
+|maxJobs|
+Maximum number of scheduler jobs to submit
+|site.westmere.maxJobs=20
+
+|maxNodesPerJob|
+The maximum number of nodes to request per scheduler job.
+|site.westmere.maxNodesPerJob=2
+
+|providerAttributes|
+Allows user to pass attributes through directly to scheduler submit script. Currently
+only implemented for sites that use PBS.
+|site.beagle.providerAttributes=pbs.aprun;pbs.mpp;depth=24
+
+|slurm|
+Pass parameters directly through to the submit script generated for the slurm
+scheduler. For example, the setting "site.midway.slurm.mail-user=username" generates
+the line "#SBATCH --mail-user=username".
+
+|taskDir|
+Tasks will be run from this directory. In the absence of a taskDir definition,
+Swift will run the task from workdir.
+|site.westmere.taskDir=/scratch/local/$USER/work
+
+|tasksPerWorker|
+The number of tasks that each worker can run simultaneously.
+|site.westmere.tasksPernode=12
+
+|taskThrottle|
+The maximum number of active tasks across all workers.
+|site.westmere.taskThrottle=100
+
+|taskWalltime|
+The maximum amount of time a task may run, in hh:mm:ss.
+|site.westmere.taskWalltime=01:00:00
+
+|site |
+Name of site or sites to run on. This is the same as running with
+swift -site <sitename>
+|site=westmere
+
+|userHomeOverride|
+Sets the Swift user home. This must be a shared filesystem. This defaults
+to $HOME. For clusters where $HOME is not accessible to the worker nodes,
+you may override the value to point to a shared directory that you own.
+|site.beagle.userHomeOverride=/lustre/beagle/username
+
+|workdir |
+The workdirectory element specifies where on the site files can be stored.
+This directory must be available on all worker nodes that will be used for
+execution. A shared cluster filesystem is appropriate for this. Note that
+you need to specify absolute pathname for this field.
+|site.westmere.workdir=/scratch/midway/$USER/work
+
+|================================
+
+Grouping site properties
+~~~~~~~~~~~~~~~~~~~~~~~~
+The example swift.properties in this document listed the following site
+related properties:
+
+-----
+site.westmere.provider=local:slurm
+site.westmere.jobsPerNode=12
+site.westmere.maxWalltime=00:05:00
+site.westmere.queue=westmere
+site.westmere.initialScore=10000
+site.westmere.filesystem=local
+site.westmere.workdir=/scratch/midway/davidkelly999
+-----
+
+However, you can also simplify this by grouping site properties together with
+curly brackets.
+
+------
+site.westmere {
+ provider=local:slurm
+ jobsPerNode=12
+ maxWalltime=00:05:00
+ queue=westmere
+ initialScore=10000
+ filesystem=local
+ workdir=/scratch/midway/$USER/work
+}
+-----
+
+App definitions
+~~~~~~~~~~~~~~~
+In 0.95, applications wildcards will be used by default. This means that
+$PATH will be searched and pathnames to application do not have to be defined.
+
+In the case where you have multiple sites defined, and you want
+control over where things run, you will need to define the location of apps.
+In this scenario, you will can define apps in swift.properties with something
+like this:
+
+-----
+app.westmere.cat=/bin/cat
+-----
+
+When an app is defined in swift.properties for any site you are running on,
+wildcards will be disabled, and all apps you want to use must be defined.
+
+General Swift properties
+~~~~~~~~~~~~~~~~~~~~~~~~
+Swift behavior can be configured through general Swift properties. Below is a list of properties:
+
+[options="header"]
+|================
+|Name|Valid Values|Default Value|Description
+
+|config.rundirs
+|true, false
+|true
+|By default, Swift will generate a run directory that contains logs, scheduler submit scripts,
+|debug directories, and other files associated with a particular Swift run. Setting this value
+|to false disables the creation of run directories and causes all logs and directories to be
+|created in the current working directory.
+
+|execution.retries
+|Positive integer
+|2
+|The number of time a job will be retried if it fails (giving a
+ maximum of 1 + execution.retries attempts at execution)
+
+|foreach.max.threads
+|Positive integer
+|1024
+|Limits the number of concurrent iterations that each foreach
+ statement can have at one time. This conserves memory for swift
+ programs that have large numbers of iterations (which would
+ otherwise all be executed in parallel)
+
+|lazy.errors
+|true, false
+|false
+|Swift can report application errors in two modes, depending on the
+ value of this property. If set to false, Swift will report the
+ first error encountered and immediately stop execution. If set to
+ true, Swift will attempt to run as much as possible from a
+ Swift script before stopping execution and reporting all
+ errors encountered. When developing Swift scripts, using the default value of
+ false can make the program easier to debug. However in production
+ runs, using true will allow more of a Swift script to be
+ run before Swift aborts execution.
+
+|swift.home
+|String
+|
+|Points to the Swift installation directory ($SWIFT_HOME). In general, this should
+ not be set as Swift can find its own installation directory, and incorrectly setting it may impair the
+ correct functionality of Swift.
+
+|pgraph
+|true, false
+|false
+|Swift can generate a Graphviz <http://www.graphviz.org/> file
+representing the structure of the Swift script it has run. If
+this property is set to true, Swift will save the provenance graph
+in a file named by concatenating the program name and the instance
+ID (e.g. helloworld-ht0adgi315l61.dot).
+If set to false, no provenance graph will be generated. If a file
+name is used, then the provenance graph will be saved in the
+specified file.
+The generated dot file can be rendered into a graphical form using
+Graphviz <http://www.graphviz.org/>, for example with a command-line
+such as:
+$ swift -pgraph graph1.dot q1.swift
+$ dot -ograph.png -Tpng graph1.dot
+
+|pgraph.graph.options
+|String
+|splines="compound", rankdir="TB"
+|This property specifies a Graphviz <http://www.graphviz.org>
+ specific set of parameters for the graph.
+
+|pgraph.node.options
+|String
+|color="seagreen", style="filled"
+|Used to specify a set of Graphviz <http://www.graphviz.org> specific
+ properties for the nodes in the graph.
+
+|provenance.log
+|true, false
+|false
+|This property controls whether the log file will contain provenance
+ information enabling this will increase the size of log files,
+ sometimes significantly.
+
+|sitedir.keep
+|true, false
+|false
+|Indicates whether the working directory on the remote site should be
+ left intact even when a run completes successfully. This can be used
+ to inspect the site working directory for debugging purposes.
+
+|status.mode
+|files, provider
+|files
+|Controls how Swift will communicate the result code of running user
+ programs from workers to the submit side. In files mode, a file
+ indicating success or failure will be created on the site shared
+ filesystem. In provider mode, the execution provider job status
+ will be used. provider mode requires the underlying job execution system to
+ correctly return exit codes.
+
+|tcp.port.range
+|none
+|<start>,<end> where start and end are integers
+|A TCP port range can be specified to restrict the ports on which
+ GRAM callback services are started. This is likely needed if your
+ submit host is behind a firewall, in which case the firewall should
+ be configured to allow incoming connections on ports in the range.
+
+|throttle.file.operations
+|<int>, off
+|8
+|Limits the total number of concurrent file operations that can
+ happen at any given time. File operations (like transfers) require
+ an exclusive connection to a site. These connections can be
+ expensive to establish. A large number of concurrent file operations
+ may cause Swift to attempt to establish many such expensive
+ connections to various sites. Limiting the number of concurrent file
+ operations causes Swift to use a small number of cached connections
+ and achieve better overall performance.
+
+|throttle.host.submit
+|<int>, off
+|2
+|Limits the number of concurrent submissions for any of the sites
+ Swift will try to send jobs to. In other words it guarantees that no
+ more than the value of this throttle jobs sent to any site will be
+ concurrently in a state of being submitted.
+
+|throttle.score.job.factor
+|<int>, off
+|4
+|The Swift scheduler has the ability to limit the number of
+concurrent jobs allowed on a site based on the performance history
+of that site. Each site is assigned a score (initially 1), which can
+increase or decrease based on whether the site yields successful or
+faulty job runs. The score for a site can take values in the (0.1,
+100) interval. The number of allowed jobs is calculated using the
+following formula:
+2 + score*throttle.score.job.factor
+This means a site will always be allowed at least two concurrent
+jobs and at most 2 + 100*throttle.score.job.factor. With a default
+of 4 this means at least 2 jobs and at most 402.
+This parameter can also be set per site using the jobThrottle
+profile key in a site catalog entry.
+
+|throttle.submit
+|<int>, off
+|4
+|Limits the number of concurrent submissions for a run. This throttle
+ only limits the number of concurrent tasks (jobs) that are being
+ sent to sites, not the total number of concurrent jobs that can be
+ run. The submission stage in GRAM is one of the most CPU expensive
+ stages (due mostly to the mutual authentication and delegation).
+ Having too many concurrent submissions can overload either or both
+ the submit host CPU and the remote host/head node causing degraded
+ performance.
+
+|throttle.transfers
+|<int>, off
+|4
+|Limits the total number of concurrent file transfers that can happen
+ at any given time. File transfers consume bandwidth. Too many
+ concurrent transfers can cause the network to be overloaded
+ preventing various other signaling traffic from flowing properly.
+
+|ticker.disable
+|true, false
+|false
+|When set to true, suppresses the output progress ticker that Swift
+ sends to the console every few seconds during a run
+
+|use.wrapper.staging
+|true, false
+|false
+|Determines if the Swift wrapper should do file staging.
+
+|wrapper.invocation.mode
+|absolute, relative
+|absolute
+|Determines if Swift remote wrappers will be executed by specifying
+ an absolute path, or a path relative to the job initial working
+ directory. In most cases, execution will be successful with either
+ option. However, some execution sites ignore the specified initial
+ working directory, and so absolute must be used. Conversely on
+ some sites, job directories appear in a different place on the
+ worker node file system than on the filesystem access node, with the
+ execution system handling translation of the job initial working
+ directory. In such cases, relative mode must be used.
+
+|wrapper.parameter.mode
+|args,files
+|args
+|Controls how Swift will supply parameters to the remote wrapper
+ script. args mode will pass parameters on the command line. Some
+ execution systems do not pass commandline parameters sufficiently
+ cleanly for Swift to operate correctly. files mode will pass
+ parameters through an additional input file. This
+ provides a cleaner communication channel for parameters, at the
+ expense of transferring an additional file for each job invocation.
+
+|wrapperlog.always.transfer
+|true, false
+|false
+|This property controls when output from the Swift remote wrapper is
+ transfered back to the submit site. When set to false, wrapper
+ logs are only transfered for jobs that fail. If set to true,
+ wrapper logs are transfered after every job is completed or failed.
+
+|================
+
+
+Using shell variables
+~~~~~~~~~~~~~~~~~~~~~
+Any value in swift.properties may contain environment variables. For example:
+
+-----
+workdir=/scratch/midway/$USER/work
+----
+
+Environment variables are expanded locally on the machine where you are running
+Swift.
+
+Swift will also define a variable called $RUNDIRECTORY that is the path to the
+run directory Swift creates. In a case where you'd like your work directory
+to be in the runNNN directory, you may do something like this:
+
+-----
+workdir=$RUNDIRECTORY
+-----
+
Modified: trunk/docs/userguide/userguide.txt
===================================================================
--- trunk/docs/userguide/userguide.txt 2014-07-09 21:52:36 UTC (rev 7981)
+++ trunk/docs/userguide/userguide.txt 2014-07-11 00:45:00 UTC (rev 7982)
@@ -11,7 +11,7 @@
include::language[]
-include::configuration[]
+include::configuration.new[]
include::debugging[]
More information about the Swift-commit
mailing list