[Swift-commit] r3683 - in SwiftApps/SwiftR/Swift: exec man
noreply at svn.ci.uchicago.edu
noreply at svn.ci.uchicago.edu
Sun Oct 17 09:15:27 CDT 2010
Author: wilde
Date: 2010-10-17 09:15:22 -0500 (Sun, 17 Oct 2010)
New Revision: 3683
Modified:
SwiftApps/SwiftR/Swift/exec/start-swift
SwiftApps/SwiftR/Swift/man/Swift-package.Rd
Log:
correct shell syntax error in ssh server code. Small doc updates.
Modified: SwiftApps/SwiftR/Swift/exec/start-swift
===================================================================
--- SwiftApps/SwiftR/Swift/exec/start-swift 2010-10-14 13:51:28 UTC (rev 3682)
+++ SwiftApps/SwiftR/Swift/exec/start-swift 2010-10-17 14:15:22 UTC (rev 3683)
@@ -259,7 +259,7 @@
elif [ $server = ssh ]; then
if [ $cores -eq 0 ]; then
- cores = $defaultRemoteCores
+ cores=$defaultRemoteCores
fi
source $SWIFTRBIN/configure-server-ssh $cores $time
Modified: SwiftApps/SwiftR/Swift/man/Swift-package.Rd
===================================================================
--- SwiftApps/SwiftR/Swift/man/Swift-package.Rd 2010-10-14 13:51:28 UTC (rev 3682)
+++ SwiftApps/SwiftR/Swift/man/Swift-package.Rd 2010-10-17 14:15:22 UTC (rev 3683)
@@ -5,62 +5,139 @@
\alias{swiftLapply}
\docType{package}
\title{
-R interface to Swift parallel scripting language
+R interface to for parallel apply() calls using Swift
}
\description{
-Description: Routines to invoke R functions and Swift scripts on
-remote resources through Swift. R functions can be remotely executed
-in parallel in a manner similar to Snow using a list of argument
-lists. Eventually more general Swift functions can be embedded and
-invked remotely as well.
+Apply R functions to lists of arguments in parallel, on distributed
+resources. Resources selection and access, scheduling, and throttling
+are provided using the Swift parallel scripting language,
+transparently to the R user.
+Currently supports local multicore CPUs; clusters runnng PBS; and
+ad-hoc clusters composed of one or more remote multicore systems
+accessible via SSH.
+
+Remote resources are managed by Swift worker agents started by SSH or
+the cluster scheduler, and accessed via a Swift protocol. Each pool of
+resources is called a "Swift server". Multiple Swift servers can be
+accessed from one R client,
+
}
+
\details{
\tabular{ll}{
Package: \tab Swift\cr
Type: \tab Package\cr
-Version: \tab 1.0\cr
-Date: \tab 2010-02-25\cr
+Version: \tab 0.1.0\cr
+Date: \tab 2010-10-15\cr
License: \tab Globus Toolkit Public License v3 (based on Apache License 2.0):
http://www.globus.org/toolkit/legal/4.0/license-v3.html \cr
LazyLoad: \tab yes\cr
}
+FIXME: Use the traditional R format for the initial part of the man page.
+
+The main function in this package is:
+
+resultList = swiftapply(function,listOfArgumentLists)
+
+swiftapply() invokes function on each of the arguments lists in
+listOfArgumentLists, and returns a list of the result of the invocation in the corresponding
+member of resultList. Each invocation is executed in parallel in an
+external, independent copy of R.
+
+This is the most general function in the Swift package, as each
+arglist is specified in full and can contain unique values for all
+arguments for each invocation of function.
+
+Options that determine the behavior of swiftapply can be set via
+options(swift.*=value) or via optional keyword arguments.
+
To use this package, create a list of argument lists, and then invoke:
-swiftapply(function,arglists).
-As a preliminary interface, you can set R options() to control Swift's
-operation:
+\verb{resultList = swiftapply(function, listOfArgumentLists)}
+For example:
+
+\preformatted{
+arglists=list()
+for(i in 1:10) {
+ arglists[[i]] = list(i)
+}
+resultlist = swiftapply(sqrt,arglists)
+}
+
+The same conventions as used by the Snowfall package are used for swiftLapply, analogous to sfLapply:
+
+\preformatted{
+r = swiftLapply(seq(1,10),sqrt)
+}
+
+Currenty swiftLapply is the only one implemented (i.e. swiftSapply etc are not yet provided but will be soon).
+
+Arbitrary R objects can be passed. For example:
+
+\preformatted{
+ data(cars)
+ data(trees)
+ sumstuff <- function(treedata, cardata) {
+ sum(treedata$Height, cardata$dist)
+ }
+ args = list(trees, cars)
+ arglist = rep(list(args), 10)
+ res = swiftapply(sumstuff, arglist)
+}
+
+As a preliminary interface, you can set R options() to control the
+operation of the functions in the Swift package:
+
options(swift.callsperbatch=n) # n = number of R calls to perform in
each Swift job.
-options(swift.site=sitename) # sitename = "local" to run on the
-current host and "pbs" to submit to a local PBS cluster.
+options(swift.server="servername") # servername = "local" to run on
+the current host, "pbs" to submit to a local PBS cluster, "pbsf" to
+run on clusters such as Merlot which have firewalls that restrict
+outbound cnnectivity from the worker nodes to the Swift server running
+on the login node.
+options(swift.keepwork=TRUE) # Retain the temporary files that the
+Swift functions use to pass R data from client t remote R
+servers. This is useful for debugging.
+
}
\section{PREREQUISITES}{
+To run Swift, you need a Java runtime environment (JRE) installed on
+the client machine (where the client R workspace will be executed).
+Worker nodes (remote resources) only need Perl (to run the Swift
+worker agent).
+
+Remote resources can be accessed via SSH or the PBS batch scheduler.
+
+Details:
+
1) Sun Java 1.4 or higher (preferably 1.6) installed and in your PATH
-Download the appropriate Java for Linux at:
- http://www.java.com/en/download/manual.jsp (JREs)
+Download the appropriate Java Runtime (JRE) for Linux at:\verb{
+ http://www.java.com/en/download/manual.jsp}
-Typically either 32 bit with this link:
- Linux (self-extracting file) filesize: 19.9
+Typically either 32 bit with this link:\verb{
+ Linux (self-extracting file) filesize: 19.9}
-Or 64 bit with this link
- Linux x64 * filesize: 19.3 MB
+Or 64 bit with this link\verb{
+ Linux x64 * filesize: 19.3 MB }
Its better to install the JDK (~80MB) from:
http://www.oracle.com/technetwork/java/javase/downloads/jdk6-jsp-136632.html (JDKs)
-(This will enable you to compile Swift fixes from the Swift development trunk)
+(This will enable you to compile Swift revisions from the Swift development trunk)
-Mac OS X: Download ...
+Mac OS X: Download ... FIXME: Mac OS X is not yet tested for use with
+this package. Few issues are expected, but some shell command
+differences may affect the scripts in this package.
Verify that you have Sun Java installed and in your PATH correctly by doing:
@@ -72,14 +149,26 @@
2) R v2.11 or higher in your PATH (on client and server machines)
+(Testing has been done on R 2.11. The package is likely to operate on
+older R versions as well, but has not yet been validated on them).
+
3) Access to Parallel Resources - You can run Swift R package using:
(a) Multiple cores on your local login host or workstation/laptop
+
(b) One or more remote machines, possibly each a multicore, accessed via ssh
+
(c) Clusters running PBS, SGE, or Condor schedulers
In configurations (b) and (c) Swift will launch its own workers, and
then communicate using its own TCP protocol.
+
+Swift workers must be able to connect back to the Swift server on TCP
+ports in the range of 30000 and higher. (FIXME: determine specifics).
+If this is not available on a cluster (e.g., Merlot), then the pbsf
+server will tunnel the Swift port over the standard ssh port, assuming
+that is reachable.
+
}
\section{INSTALLATION}{
@@ -93,116 +182,106 @@
export GLOBUS_HOSTNAME=10.0.0.200 # Eg for Merlot: internal address of the login node
}
-\section{QUICK START}{
+\section{QUICK_START}{
+\preformatted{
In a shell (outside of R) start the local Swift server:
-$HOME/RLibrary/Swift/exec/swift-start local
+$HOME/RLibrary/Swift/exec/swift-start local #
+export R_LIBS=$HOME/RLibrary
+
R
> require(Swift)
->basicSwiftTest()
->runAllSwiftTests()
+> basicSwiftTest() # should take about 1 second
+> runAllSwiftTests() # should take < 60 seconds
-These will produce output similar to:
-\preformatted{
+}
-> require(Swift)
-Loading required package: Swift
-> basicSwiftTest()
+}
-*** Starting test 1.1 ***
+\section{ENVIRONMENT_VARIABLES}{
-Test of local do.call(sumstuff)
-local result=
-[1] 4505
+If used, these variables must be exported in the UNIX environment in which
+swift-start is executed:
-Test of swiftapply(sumstuff,arglist)
+SWIFTR_TMP sets the root directory below which Swift will maintain its
+directory structure. Defaults to /tmp. For, e.g., the PADS cluster,
+its best to set this to /scratch/local, as /tmp is very limited in
+space. Seldom needed.
-Swift properties:
- swiftserver = local
- callsperbatch = 1
- runmode = service
- tmpdir = /tmp
- workerhosts = localhost
- initialize = initVar1 <<- 19; initVar2 <<- sqrt(400)+3
-
-Swift request is in /tmp/wilde/SwiftR/requests.P04114/R0000000
-Swift result:
-[[1]]
-[1] 4505
-
-==> test 1.1 passed
->
+GLOBUS_HOSTNAME should be set to the IP address of the login host if
+it contains multiple network interfaces (see /sbin/ifconfig) and if
+the worker nodes can only reach the login host on a subset of these
+interfaces.
}
-}
+\section{START_SERVERS}{
-\section{CONFIGURE SERVERS}{
+To run swiftapply() and any of the swiftXapply() functions, you first
+start one or more "Swift servers" on your local host (where you will
+run the R client workspace.
-edit configure-site-NAME in exec/
+Currently you must do this manually and in your login shell, outside
+of R - BEFORE trying to run R Swift functions. If you run swiftapply()
+without a Swift server running, your R session will hang and you will
+need to kill it. This issue will be resolved shortly.
-can put local cores into an ssh pool
+The start-swift command (and all related shell scripts) are located in the installed package "exec" directry, so its handy to set a shell variable to point there:
-swift $HOME/.ssh/auth-defaults file for additional shs servers
+SWIFT=<your package install dir>/Swift/
-access remote systems via ssh
+Examples of starting the Swift server follow.
-Export SWIFTR_TMP in your environment
-}
+To run N parallel R servers on the local host, one for each core:
-\section{START SERVERS}{
+\verb{$SWIFT/exec/start-swift}
-# do this outside of R - BEFORE trying to run R Swift functions
+To run 4 R servers:
-SWIFT=<your package install dir>/Swift/
+\verb{$SWIFT/exec/start-swift -c 4}
-$SWIFT/exec/start-swift local
+To run 4 R servers on each of two hosts that can be reach by ssh:
-or
+\verb{$SIFT/exec/start-swift -s ssh -c 4 -h "hostname1 hostname2"}
-$SWIFT/exec/start-swift pbsman
+To run 8 R servers for 30 minutes on each of 3 nodes of the Merlot cluster, run this on the login host "merlot", using its "serial" queue:
-or
+\verb{$SWIFT/exec/start-swift -s pbsf -c 8 -n 3 -t 00:30:00 -q serial}
-$SWIFT/exec/start-swift ssh host1 ... hostN
-
These Swift servers can be started and left running, across R runs
options(swift.server="local") # or "pbsman" or "ssh"
}
-\section{HELLO WORLD TEST}{
+\section{TESTS}{
+Running a hello world test:
+
+\preformatted{
# Start swift local server as above
require(Swift)
basicSwiftTest()
}
-\section{RUN FULL TEST}{
+Running a full test
-As a regular user:
-
+\preformatted{
require(Swift)
-fullSwiftTest()
+runAllSwiftTests()
+}
-# Then
+Running full tests n times:
-n=10 # 10 times through full test loop
-
+\preformatted{
testLoop(n)
+}
-
-Testing from the source tree:
-
-source("Swift/tests/TestSwift.R")
-
-or R CMD TEST etc? FIXME
}
-\section{STOPPING SWIFT SERVERS}{
+\section{STOPPING_SWIFT_SERVERS}{
The following ps command is useful for displaying the many background
swift processes. I keep this aliased as "mp" (my processes):
@@ -256,14 +335,14 @@
}
-\section{OPENMX EXAMPLES}{
+\section{OPENMX_EXAMPLES}{
This section is specific to users of the OpenMX R package for
structural equation modeling.
}
-\section{USING ADDITIONAL PARALLEL ENVIRONMENTS}{
+\section{USING_OTHER_PARALLEL_ENVIRONMENTS}{
3) ssh confiured for password-free login (to run on remote worker nodes)
@@ -315,8 +394,21 @@
* manual mode
+You can get Swift to stop after it has produced the call batch file
+(an R save()'d object file. You can manually load this in another R
+workspace, inspect the contents, manually onvoke the remote call, and
+end the data back. This is useful if issues arise in the transparency
+of function, argument, and retrun value marshalling, and if there are
+concerns about the transparency of the remote execution.
+
* logs to look at
+See the section on the directory structure of Swift services.
+
+"info" files returned from each execution can contain messages if the remote R jobs fail to launch.
+
+Each R server has its own log, on the host on which the server executes.
+
* is my swift server responding?
tail -f $TMP/$USER/SwiftR/swift.local/swift.stdouterr
@@ -382,7 +474,7 @@
killlall -u $USER. This can be made more precise to avoid killing jobs
on shared worker nodes.
-The following caveats are high prioiry on the FIXME list:
+The following caveats are high priority on the FIXME list:
You MUST start the Swift server before running a swiftapply() call
from R. Otherwise R hangs and must be killed and restarted.
@@ -390,24 +482,28 @@
When the FIFOs (named pipes) which are used to communicate from R to
Swift get hung, you need to use kill or Quit to break out of R.
-There is no automatic restart yet if swift dies in its server
-loop.
+There is no automatic restart yet if swift dies in its server loop. In
+particular, parsing errors, eg on the Swift initialexpr text, can
+cause the R and hence the Swift server to exit. The
Only lapply is implemented (also SwiftApply) - need to see if we can
cut down arg passing overhead for many of the apply() cases.
Log records build up fast; these will be reduced as we get more
-confidence with the code and shake out bugs
+confidence with the code and shake out bugs,
+There is no asynchronous swiftapply call yet. Each call must complete
+before control is returned to the R command loop.
+
+Only one instance of each server type (i.e., local, pbs, pbsf, ssh)
+can be started at a time.
+
Lower priority issues are:
There is no easy way yet to alter Swift configuration file variables
-such as number of cores to use, etc. Do this for now by editing an
-existing configuration under Swift/exec/conigure-swift-NNN where NNN
-is the Swift server name.
-Each swiftapply() is pretty noisy - it echos its options, etc. This
-will quiet down and "verbose" will be made an option.
+Swift echoes its options on every call. It will be made silent and
+"verbose" will be made an option.
}
More information about the Swift-commit
mailing list