[Swift-commit] r3680 - in SwiftApps/SwiftR/Swift: exec man

noreply at svn.ci.uchicago.edu noreply at svn.ci.uchicago.edu
Mon Oct 11 22:34:27 CDT 2010


Author: wilde
Date: 2010-10-11 22:34:27 -0500 (Mon, 11 Oct 2010)
New Revision: 3680

Modified:
   SwiftApps/SwiftR/Swift/exec/start-swift
   SwiftApps/SwiftR/Swift/man/Swift-package.Rd
Log:
Added pbsman as an option to start-swift for manual pbs coaster start.

Modified: SwiftApps/SwiftR/Swift/exec/start-swift
===================================================================
--- SwiftApps/SwiftR/Swift/exec/start-swift	2010-10-11 15:27:08 UTC (rev 3679)
+++ SwiftApps/SwiftR/Swift/exec/start-swift	2010-10-12 03:34:27 UTC (rev 3680)
@@ -2,7 +2,7 @@
 
 # Define internal functions
 
-function wait-and-start-workers
+get-contact()
 {
   # Look for:
   # Passive queue processor initialized. Callback URI is http://140.221.8.62:55379
@@ -23,7 +23,11 @@
 
   CONTACT=$(echo $uriline | sed -e 's/^.*http:/http:/')
   echo Coaster service contact URI: $CONTACT
+}
 
+function wait-and-start-ssh-workers
+{
+  get-contact
   LOGDIR=$(pwd)/swiftworkerlogs # full path. FIXME: Generate this with remote-side paths if not shared dir env?
   LOGDIR=/tmp/$USER/SwiftR/swiftworkerlogs  # FIXME: left this in /tmp so it works on any host. Better way?
 
@@ -44,20 +48,51 @@
        sshpids="$sshpids $!"
   done
 
-  echo Started workers from these ssh processes: $sshpids
+  echo Started workers from ssh processes $sshpids
   echo $sshpids > $sshpidfile
 }
 
+make-pbs-submit-file()
+{
+cat >pbs.sub <<END
+#PBS -S /bin/sh
+#PBS -N SwiftR-workers
+#PBS -m n
+#PBS -l nodes=$nodes
+#PBS -l walltime=$walltime
+#PBS -q $queue
+#PBS -o pbs.stdout
+#PBS -e pbs.stderr
+WORKER_LOGGING_ENABLED=true
+cd / && /usr/bin/perl $SWIFTBIN/worker.pl $CONTACT SwiftR-workers $HOME/.globus/coasters $IDLETIMEOUT
+END
+}
+
+function wait-and-start-pbs-workers
+{
+  get-contact
+  LOGDIR=$(pwd)/swiftworkerlogs # full path. FIXME: Generate this with remote-side paths if not shared dir env?
+  LOGDIR=/tmp/$USER/SwiftR/swiftworkerlogs  # FIXME: left this in /tmp so it works on any host. Better way?
+
+  mkdir -p $LOGDIR
+
+  IDLETIMEOUT=$((60*60*240)) # 10 days: FIXME: make this a command line arg
+
+  # FIXME: set up for capturing pbs job id: rm -rf remotepid.* # FIXME: should not be needed if we start in a new dir each time
+  make-pbs-submit-file
+  qsub pbs.sub>$pbsjobidfile
+
+  echo Started workers from PBS job $(cat $pbsjobidfile)
+}
+
 # main script
 
-site=$1 # local, ssh, ...
+site=$1 # local, ssh, pbsauto, pbsman ...
 
 # FIXME: check args and use better arg parsing
 
 tmp=${SWIFTR_TMP:-/tmp}
 
-echo DB $0: site=$site tmp=$tmp
-
 throttleOneCore="-0.001"
 throttleOneCore="0.00"
 localcores=5 # FIXME: parameterize: localthreads=N
@@ -65,8 +100,6 @@
 SWIFTRBIN=$(cd $(dirname $0); pwd)
 SWIFTBIN=$SWIFTRBIN/../swift/bin  # This depends on ~/SwiftR/Swift/swift being a symlink to swift in RLibrary/Swift
 
-echo DB $0: SWIFTRBIN=$SWIFTRBIN SWIFTBIN=$SWIFTBIN
-
 rundir=$tmp/$USER/SwiftR/swift.$site  # rundir prefix # FIXME: handle multiple concurent independent swift servers per user
 mkdir -p $(dirname $rundir)
 trundir=$(mktemp -d $rundir.XXXX) # FIXME: check success
@@ -74,7 +107,7 @@
 ln -s $trundir $rundir
 cd $rundir
 
-echo DB $0: rundir=$(pwd) SWIFTRBIN=$SWIFTRBIN SWIFTBIN=$SWIFTBIN
+echo Running in $trundir "(linked to $rundir)"
 
 script=$SWIFTRBIN/rserver.swift
 #cp $script $SWIFTRBIN/passive-coaster-swift $SWIFTRBIN/swift.properties $rundir
@@ -101,8 +134,6 @@
 
   sshpidfile=${out/stdouterr/workerpids}
 
-  echo swift output is in: $out, pids in $sshpidfile
-
   TRAPS="EXIT 1 2 3 15"  # Signals and conditions to trap
 
   function onexit {
@@ -124,9 +155,40 @@
 
   trap onexit $TRAPS
 
-  wait-and-start-workers &
+  wait-and-start-ssh-workers &
   starterpid=$!
 
+elif [ $site = pbsman ]; then
+
+  # FIXME: Parameterize:
+
+  walltime="01:00:00"
+  nodes=1
+  queue=short
+
+  pbsjobidfile=${out/stdouterr/pbsjobid}
+
+  TRAPS="EXIT 1 2 3 15"  # Signals and conditions to trap
+
+  function onexit {
+    coasterservicepid="" # null: saved in case we go back to using coaster servers
+    trap - $TRAPS
+    pbsjobid=$(cat $pbsjobidfile)
+    echo Terminating worker processes starter $starterpid and PBS job $pbsjobid
+    if [ "_$starterpid != _ ]; then
+      kill $starterpid
+    fi
+    if [ "_$pbsjobid != _ ]; then
+      qdel $pbsjobid
+    fi
+    kill 0 # Kill all procs in current process group # FIXME: what was this for????
+  }
+
+  trap onexit $TRAPS
+
+  wait-and-start-pbs-workers &
+  starterpid=$!
+
 fi
 
 $SWIFTRBIN/../swift/bin/swift -config cf -tc.file tc -sites.file sites.xml $script -pipedir=$(pwd) >& $out </dev/null

Modified: SwiftApps/SwiftR/Swift/man/Swift-package.Rd
===================================================================
--- SwiftApps/SwiftR/Swift/man/Swift-package.Rd	2010-10-11 15:27:08 UTC (rev 3679)
+++ SwiftApps/SwiftR/Swift/man/Swift-package.Rd	2010-10-12 03:34:27 UTC (rev 3680)
@@ -41,35 +41,97 @@
  
 PREREQUISITES
 
-Sun Java (pref 1.6 - will it work below?)
+1) Sun Java 1.4 or higher (preferably 1.6) installed and in your PATH
 
-FIXME: How to install Java if needed
+Download the appropriate Java for Linux at:
+	http://www.java.com/en/download/manual.jsp (JREs)
 
-R v2.11 or higher in your PATH (on client and server machines)
+Typically either 32 bit with this link:
+	Linux (self-extracting file)  filesize: 19.9
 
-Ability to ssh to server machines (without password: agents, master
-control channel, etc) (FIXME: Are these limitations necessary?)
-Passwords or ssh key passphrases OK for some scenarios.
+Or 64 bit with this link
+	Linux x64 * filesize: 19.3 MB 
 
-ssh from Mac
+Its better to install the JDK (~80MB) from:
 
-ssh -A when jumping to a new host (to forward the ssh agent)
+	http://www.oracle.com/technetwork/java/javase/downloads/jdk6-jsp-136632.html (JDKs)
 
-(or set up ssh agents manually)
+(This will enable you to compile Swift fixes from the Swift development trunk)
 
-(document ssh tricks here for pw-less access)
+Mac OS X: Download ...
 
+Verify that you have Sun Java installed and in your PATH correctly by doing:
 
+$ java -version
+java version "1.6.0_21"
+Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
+Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
+$ 
+
+2) R v2.11 or higher in your PATH (on client and server machines)
+
+3) Access to Parallel Resources - You can run Swift R package using:
+
+(a) Multiple cores on your local login host or workstation/laptop
+(b) One or more remote machines, possibly each a multicore, accessed via ssh
+(c) Clusters running PBS, SGE, or Condor schedulers
+
+In configurations (b) and (c) Swift will launch its own workers, and
+then communicate using its own TCP protocol.
+
 INSTALL
 
-cd ???
+mkdir ~/RPackages ~/RLibrary # if not already created
+cd ~/RPackages
 wget http://www.ci.uchicago.edu/~wilde/Swift_0.1.tar.gz
-R CMS INSTALL Swift_0.1.tar.gz
+R CMS INSTALL -l ~/RLibrary Swift_0.1.tar.gz
+export R_LIBS=~/RLibrary
 
-SVN
+QUICK START
 
-svn checkout https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR
+In a shell (outside of R) start the local Swift server:
 
+$HOME/RLibrary/Swift/exec/swift-start local
+
+R
+> require(Swift)
+>basicSwiftTest()
+>runAllSwiftTests()
+
+These will produce output similar to:
+
+\\\\\\VERBATIM
+
+> require(Swift)
+Loading required package: Swift
+> basicSwiftTest()
+
+*** Starting  test 1.1 ***
+
+Test of local do.call(sumstuff)
+local result=
+[1] 4505
+
+Test of swiftapply(sumstuff,arglist)
+
+Swift properties:
+  swiftserver = local 
+  callsperbatch = 1 
+  runmode = service 
+  tmpdir = /tmp 
+  workerhosts = localhost 
+  initialize = initVar1 <<- 19; initVar2 <<- sqrt(400)+3 
+
+Swift request is in /tmp/wilde/SwiftR/requests.P04114/R0000000 
+Swift result:
+[[1]]
+[1] 4505
+
+
+==> test 1.1 passed
+> 
+////////////// VERBATIM
+
 CONFIGURE SERVERS
 
 edit configure-site-NAME in exec/
@@ -87,13 +149,21 @@
 # do this outside of R - BEFORE trying to run R Swift functions
 
 SWIFT=<your package install dir>/Swift/
-$SWIFT/exec/start-swift-workers hostname
-$SWIFT/exec/start-swift-server 
 
-local and ssh servers can be started and left running, across R runs
+$SWIFT/exec/start-swift local
 
-found via:
+or
 
+$SWIFT/exec/start-swift pbsman
+
+or
+
+$SWIFT/exec/start-swift ssh host1 ... hostN
+
+These Swift servers can be started and left running, across R runs
+
+options(swift.server="local") #  or "pbsman" or "ssh"
+
 HELLO WORLD TEST
 
 # Start swift local server as above
@@ -115,34 +185,124 @@
 swiftTestLoop(n)
 
 
-In source tree:
+Testing from the source tree:
 
 source("Swift/tests/TestSwift.R")
 
-  or R CMD TEST etc?
+or R CMD TEST etc?  FIXME
 
+
+STOPPING SWIFT SERVERS
+
+The following ps command is useful for displaying the many background
+swift processes. I keep this aliased as "mp" (my processes):
+
+  alias mp='ps -fjH -u $USER'
+
+Local (swift-start local):
+
+$ jobs
+$ kill %1
+
+Remote (swift-start ssh):
+
+$ jobs
+$ kill %1  # This tries to track down the remote processes and kill them
+
+Cluster (swift-start pbs):
+
+$ jobs
+$ kill %1 # Swift should terminate its queued and/or running cluster jobs
+
+
+Occaasionally a killall R and/or killall java is required
+
 USAGE
 
 Swift returns Error object when remote side fails.
 
+swiftapply( )
+
+
 options:
+
   swift.server: matched server name on start-swift
+
   swift.callsperbatch
+
   initialize: 
 
 less likely to touch:
   remove temp reqs (sp???) FIXME
   mode (service, manual, ???)
 
+Other Swift functions (compatible with Snow/Snowfall packages):
+
+swiftLapply
+
+To be developed: swiftSapply, ...
+
+TESTS AND EXAMPLES
+
+basicSwiftTest()
+
+runAllSwiftTests()
+
+testloop(n)
+
+
 OPENMX EXAMPLES
 
 This section is specific to users of the OpenMX R package for
 structural equation modeling.
 
+USING ADDITIONAL PARALLEL ENVIRONMENTS
+
+3) ssh confiured for password-free login (to run on remote worker nodes)
+
+Ability to ssh to server machines (without password: agents, master
+control channel, etc) (FIXME: Are these limitations necessary?)
+Passwords or ssh key passphrases OK for some scenarios.
+
+ssh from Mac
+
+ssh -A when jumping to a new host (to forward the ssh agent)
+
+(or set up ssh agents manually)
+
+(document ssh tricks here for pw-less access)
+
+
+
 DIRECTORY STRUCTURE USED FOR SWIFT RUNTIME
 
 PROCEESS STRUCTURE USE FOR SWIFT RUNTIME
 
+vvvv VERBATIM
+
+vanquish$ mp
+UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
+wilde     3621  3553  3553  3553  0 19:17 ?        00:00:00 sshd: wilde at pts/1
+wilde     3622  3621  3622  3622  0 19:17 pts/1    00:00:00   -bash
+wilde     3726  3622  3726  3622  0 19:20 pts/1    00:00:00     /bin/bash ./start-swift local
+wilde     3775  3726  3726  3622  0 19:20 pts/1    00:00:00       /bin/sh /homes/wilde/RLibrary/Swift/exec/../swift/bin/swift -config
+wilde     3835  3775  3726  3622  0 19:20 pts/1    00:00:11         java -Xmx256M -Djava.endorsed.dirs=/homes/wilde/RLibrary/Swift/exe
+wilde     8664  3622  8664  3622  0 20:47 pts/1    00:00:00     ps -fjH -u wilde
+wilde     3441  3366  3366  3366  0 19:16 ?        00:00:00 sshd: wilde at pts/0
+wilde     3442  3441  3442  3442  0 19:16 pts/0    00:00:00   -bash
+wilde     4114  3442  4114  3442  0 19:35 pts/0    00:00:05     /usr/lib64/R/bin/exec/R
+wilde     4667     1  3726  3622  0 19:38 pts/1    00:00:00 /usr/lib64/R/bin/exec/R --slave --no-restore --file=./SwiftRServer.sh --ar
+wilde     4611     1  3726  3622  0 19:38 pts/1    00:00:00 /usr/lib64/R/bin/exec/R --slave --no-restore --file=./SwiftRServer.sh --ar
+wilde     4569     1  3726  3622  0 19:38 pts/1    00:00:00 /usr/lib64/R/bin/exec/R --slave --no-restore --file=./SwiftRServer.sh --ar
+wilde     4562     1  3726  3622  0 19:38 pts/1    00:00:00 /usr/lib64/R/bin/exec/R --slave --no-restore --file=./SwiftRServer.sh --ar
+wilde     4522     1  3726  3622  0 19:38 pts/1    00:00:00 /usr/lib64/R/bin/exec/R --slave --no-restore --file=./SwiftRServer.sh --ar
+wilde     4455     1  3726  3622  0 19:38 pts/1    00:00:00 /usr/lib64/R/bin/exec/R --slave --no-restore --file=./SwiftRServer.sh --ar
+wilde     4270     1  3726  3622  0 19:38 pts/1    00:00:00 /usr/lib64/R/bin/exec/R --slave --no-restore --file=./SwiftRServer.sh --ar
+wilde     4160     1  3726  3622  0 19:36 pts/1    00:00:00 /usr/lib64/R/bin/exec/R --slave --no-restore --file=./SwiftRServer.sh --ar
+vanquish$ 
+
+^^^^^ VERBATIM
+
 DEBUGGING AND TROUBLESHOOTING
 
 * manual mode
@@ -151,8 +311,17 @@
 
 * is my swift server responding?
 
-tail -f $TMP/
+tail -f $TMP/$USER/SwiftR/swift.local/swift.stdouterr
 
+You should see periodic status update lines such as the following:
+
+vvvvvvvvv VERBATIM
+
+
+
+
+^^^^^^^^^ VERBATIM
+
 * reporting bugs: what to send  (FIXME: need swiftsnapshot script)
 
 * setting Swift worker logging with $HOME/.globus/coasters/loglevel
@@ -160,36 +329,58 @@
 4=least detaild, 5=off.  This is an interim log control mechanism and
 may be deprecated in the future.
 
+CHECKOUT AND BUILD SWIFT R PACKAGE FROM SVN
+WITH COMPILED SWIFT BINARY RELEASE (TRUNK) FROM SVN
+
+cd ~
+svn checkout https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR
+cd SwiftR/Swift
+mkdir swift
+cd swift
+wget http://www.ci.uchicago.edu/~wilde/swift.rNNNN.cog.rNNNN.tar.gz
+
+cd ~/SwiftR
+./install.sh # generates a .gz package in ~/public_html/*.gz
+
 CAVEATS
 
+Swift requires Sun Java 1.4 or above; preferably 1.6. It will not run
+under gcj (GNU Java) although it is getting closer to being able to
+and may work - to some extent in in some settings.  You need to ensure
+that a suitable Sun Java in in your PATH.
+
+In addition, the environment variable CLASSPATH should not be set.
+
+Variables set in the initialze script must typically be set in global
+environment ( var <<- value );
+
+The following caveats are high prioiry on the FIXME list:
+
 You MUST start the Swift server before running a swiftapply() call
 from R. Otherwise R hangs and must be killed and restarted.
 
-When Swift fifos (named pipes) get hung, you need to use kill or Quit
-to break out of R. FIXME
+When the FIFOs (named pipes) which are used to communicate from R to
+Swift get hung, you need to use kill or Quit to break out of R.
 
 There is no automatic restart yet if swift dies in its server
-loop. FIXME
+loop.
 
-Variables set in the initialze script must typically be set in global
-environment ( var <<- value);
-
 Only lapply is implemented (also SwiftApply) - need to see if we can
-cut down arg passing overhead for many of the apply() cases
+cut down arg passing overhead for many of the apply() cases.
 
 Log records build up fast; these will be reduced as we get more
 confidence with the code and shake out bugs
 
+Lower priority issues are:
+
 There is no easy way yet to alter Swift configuration file variables
 such as number of cores to use, etc. Do this for now by editing an
 existing configuration under Swift/exec/conigure-swift-NNN where NNN
 is the Swift server name.
 
-Each swiftapply() is pretty noisy - it echos its options etc. This
-will quiet down.
+Each swiftapply() is pretty noisy - it echos its options, etc. This
+will quiet down and "verbose" will be made an option.
 
-
-
 }
 
 \author{




More information about the Swift-commit mailing list