[Swift-commit] r4785 - in trunk/bin: . grid

wilde at ci.uchicago.edu wilde at ci.uchicago.edu
Wed Jul 6 17:41:37 CDT 2011


Author: wilde
Date: 2011-07-06 17:41:37 -0500 (Wed, 06 Jul 2011)
New Revision: 4785

Added:
   trunk/bin/grid/
   trunk/bin/grid/1worker.sh
   trunk/bin/grid/README
   trunk/bin/grid/TODO
   trunk/bin/grid/foreachsite
   trunk/bin/grid/gen_gridsites
   trunk/bin/grid/get_greensites
   trunk/bin/grid/log4j.properties.debug
   trunk/bin/grid/maketcfrominst
   trunk/bin/grid/mk_catalog.rb
   trunk/bin/grid/mk_cats.rb
   trunk/bin/grid/mk_osg_sitetest.rb
   trunk/bin/grid/osgcat
   trunk/bin/grid/ress.rb
   trunk/bin/grid/ressfields
   trunk/bin/grid/run_workers
   trunk/bin/grid/sites
   trunk/bin/grid/start-ranger-service
   trunk/bin/grid/start-ranger-service~
   trunk/bin/grid/start-swift-service
   trunk/bin/grid/swift-workers
   trunk/bin/grid/worker.sh
   trunk/bin/grid/workers.pbs.sub
   trunk/bin/grid/workers.ranger.sh
   trunk/bin/grid/workers.ranger.sub
Log:
Initial version of toolkit for OSG and TeraGrid

Added: trunk/bin/grid/1worker.sh
===================================================================
--- trunk/bin/grid/1worker.sh	                        (rev 0)
+++ trunk/bin/grid/1worker.sh	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,2 @@
+PORT=$1
+WORKER_LOGGING_LEVEL=DEBUG ./worker.pl http://128.135.125.17:$PORT swork01 ./workerlogs 


Property changes on: trunk/bin/grid/1worker.sh
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/README
===================================================================
--- trunk/bin/grid/README	                        (rev 0)
+++ trunk/bin/grid/README	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,86 @@
+Coaster Pool generator
+======================
+
+Description: This set of scripts creates configuration files and workflows to request pilot
+  jobs to OSG.
+Author: Allan Espinosa
+Date: 2010 November 24
+
+
+>> 0.a setup OSG package
+
+$ source /opt/osg-1.2.16/setup.sh
+
+>> 0. create a voms proxy
+
+$ voms-proxy-init -voms Engage -valid 12:00
+
+
+Scripts
+-------
+
+1.  start_services.sh - Starts coaster services
+    Usage: start_services.sh [number of services]
+
+2.  mk_catalog.rb - Generates sites.xml files for submitting condor, coaster and
+      gram2 jobs to a list of OSG jobs.  The whitelist is formatted as
+      [GlueSiteUniqueID]_[GlueCEInfoHostName] per site. 
+    Usage: mk_catalog.rb [whitelist] [<optional: app_name>]
+
+3.  nqueue.rb - Submits pilot coaster jobs to a list of sites by saturating it
+      queueing n-jobs at a time.
+    Usage: nqueue.rb [whitelist]
+
+
+Example usage
+-------------
+
+Here, an app called 'extenci' will be installed on the SPRACE site resource.
+
+1. Create a whitelist file.  (use ../site_gen/gensites.sh )
+
+$ cat > whitelist << EOF
+SPRACE_osg-ce.sprace.org.br
+EOF
+
+
+2. Generate gram2 sites.xml (gt2_osg.xml) file. 
+   
+$ ./mk_catalog.rb whitelist extenci
+
+# Modify to use specified portlist (from start-services)
+# Note that if port changes  you will need to regen...
+
+3. Upload worker.pl script to the site.  The setup.k script also cleans up the
+
+#   data directory of the site.
+#
+# $ swift setup.k  # Modify this script - upload not needed, but cleanup still needed
+
+4. Spawn 2 coaster services.  The first one is for PADS.
+   
+$ ./start_services.sh 2
+
+# 2 is just for a test...actually need Nsites+1 services (+1 for PADS, +N for other fixed sites eg Beagle)
+
+5. Configure the service to run in passive mode.  Any swift job that will use
+   coaster_osg.xml can be used aside from the slave.swift script.
+
+$ swift -config swift.properties -sites.file coaster_osg.xml slave.swift 
+
+6. Request coaster jobs.  The script will request (2.5 * total_cpus) pilot jobs
+   throughout the duration of the workflow
+   
+   Method 1: via direct condor-g
+
+$ ./nqueue.rb whitelist
+
+   Method 2: via swift
+
+$ swift -config swift.properties -sites.file condor_osg.xml worker.swift
+
+
+7. Run your workflow.  Here, the sleep.swift sample included in the package will
+   be used.
+
+$ swift -config swift.properties -sites.file coaster_osg.xml sleep.swift

Added: trunk/bin/grid/TODO
===================================================================
--- trunk/bin/grid/TODO	                        (rev 0)
+++ trunk/bin/grid/TODO	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,40 @@
+
+
+ExTENCI Exec
+
+create modft install & test file; test under fork and work
+
+run sites gen
+
+run tcgen
+
+run factory  (convert factory from ruby to shell?)
+
+run wf (one service for all w/ provider staging; one service per site)
+
+
+TO RESOLVE
+
+- how to set swift throttles to handle a varying number of coaster workers per site?
+
+- why did Allan set exceptions in workdir names, eg for BNL?
+
+- how to dynamically grow/shrink pool and add/remove sites; dynamically take coaster services in and out of service.
+
+- settings for retry and replication
+
+
+FEATURES
+
+Add site secltion option to foreachsite
+
+(Swift feature: foreach site in siwift?_)
+
+CLEANUP
+
+Find all interim p-baked tools under swift/lab/osg and place under grid/ for development
+
+Find Glen's tgsites command and integrate
+
+Merge in gstar
+

Added: trunk/bin/grid/foreachsite
===================================================================
--- trunk/bin/grid/foreachsite	                        (rev 0)
+++ trunk/bin/grid/foreachsite	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,69 @@
+# Default settings:
+
+resource=work # fork or work(er)
+
+# FIXME: test for valid proxy
+
+# Usage: foreachsite [-resource fork|worker ] scriptname
+
+usage="$0 foreachsite [-resource fork|worker ] scriptname"
+
+# Process command line arguments
+
+while [ $# -gt 0 ]; do
+  case $1 in
+    -resource) resource=$2; shift 2 ;;
+    -*) echo $usage 1>&2
+        exit 1 ;;
+    *) scriptparam=$1
+       scriptpath=$(cd $(dirname $scriptparam); echo $(pwd)/$(basename $scriptparam)) ; shift 1 ;;
+  esac
+done
+
+if [ _$scriptparam = _ ]; then
+  echo $usage 1>&2
+  exit 1
+fi
+
+rundir=$(mktemp -d run.XX)
+cd $rundir
+
+echo Running foreachsite: resource=$resource script=$scriptparam rundir=$rundir
+
+# swift-osg-ress-site-catalog --engage-verified --condor-g >osg.xml
+swift-osg-ress-site-catalog --engage  --condor-g >osg.xml
+
+for jobmanager in $(grep gridRes osg.xml | sed -e 's/^.* //' -e 's/<.*//' ); do
+  ( sitename=$(echo $jobmanager | sed -e 's,/.*,,')  # strip off batch jobmanager /jobmanager, leaving site name
+    mkdir $sitename
+    cd $sitename
+    if [ $resource = fork ]; then
+      resname=$sitename
+    else
+      resname=$jobmanager
+    fi
+    cat >condor.sub <<END
+#jobmanger=$jobmanager
+#sitename=$sitename
+#resname=$resname
+universe = grid
+grid_resource = gt2 $resname
+stream_output = False
+stream_error  = False
+Transfer_Executable = True
+output = $(pwd)/submit.stdout
+error = $(pwd)/submit.stderr
+log = $(pwd)/submit.log
+
+remote_initialdir = /tmp
+executable = $scriptpath
+#arguments = "-c" "python -V" # fails!
+notification = Never
+leave_in_queue = False
+queue
+
+END
+
+    condor_submit condor.sub
+  )
+done


Property changes on: trunk/bin/grid/foreachsite
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/gen_gridsites
===================================================================
--- trunk/bin/grid/gen_gridsites	                        (rev 0)
+++ trunk/bin/grid/gen_gridsites	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,31 @@
+#! /bin/bash
+
+bin=$(cd $(dirname $0); pwd)
+
+work=$(mktemp -d gen_gridsites.run.XXX)
+if [ $? != 0 ]; then
+  echo $0: failed to create work directory
+  exit 1
+fi
+echo $0: working directory is $work
+cd $work
+
+ruby -I $bin $bin/mk_osg_sitetest.rb 
+
+cat >cf <<END
+lazy.errors=true
+execution.retries=0
+status.mode=files
+use.provider.staging=false
+wrapperlog.always.transfer=false
+END
+
+swift -config cf -tc.file tc.data -sites.file sites.xml test_osg.swift >& swift.out &
+
+echo $0: Started test_osg.swift script - process id is $!
+
+# To Do: 
+#  harvest new sites with an updated goodsites file every few minutes
+#  plot (print) a histogram of site aquisition over time
+#  FIXME: move mk_test.rb to libexec/grid
+#  check for or create valid proxy


Property changes on: trunk/bin/grid/gen_gridsites
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/get_greensites
===================================================================
--- trunk/bin/grid/get_greensites	                        (rev 0)
+++ trunk/bin/grid/get_greensites	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,8 @@
+#! /bin/bash
+
+latestrun=$(ls -1td gen_gridsites.run.* | head -1)
+
+echo -e Green sites from $latestrun '\n' 1>&2 
+
+cd $latestrun
+cat cat*.out


Property changes on: trunk/bin/grid/get_greensites
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/log4j.properties.debug
===================================================================
--- trunk/bin/grid/log4j.properties.debug	                        (rev 0)
+++ trunk/bin/grid/log4j.properties.debug	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,69 @@
+# Set root category priority to WARN and its appenders to CONSOLE and FILE.
+log4j.rootCategory=INFO, CONSOLE, FILE
+#log4j.rootCategory=DEBUG, CONSOLE, FILE
+
+log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
+log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
+log4j.appender.CONSOLE.Threshold=INFO
+log4j.appender.CONSOLE.layout.ConversionPattern=%m%n
+
+log4j.appender.FILE=org.apache.log4j.FileAppender
+log4j.appender.FILE.File=swift.log
+log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
+log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n
+
+
+log4j.logger.org.apache.axis.utils=ERROR
+
+# Swift
+
+log4j.logger.swift=DEBUG
+log4j.logger.swift.textfiles=DEBUG
+log4j.logger.org.globus.swift.trace=INFO
+log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG
+log4j.logger.org.griphyn.vdl.karajan.functions.ProcessBulkErrors=WARN
+log4j.logger.org.griphyn.vdl.engine.Karajan=INFO
+log4j.logger.org.griphyn.vdl.karajan.lib=INFO
+log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG
+log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG
+log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG
+
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=INFO
+#ADDED:
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Node=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Settings=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BlockTask=INFO
+
+log4j.logger.org.globus.cog.abstraction.impl.execution.coaster.SubmitJobCommand.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.impl.execution.coaster.ServiceConfigurationCommand.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.ServiceConfigurationHandler.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BQPStatusCommand.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BlockTask.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.OverallocatedJobDurationMetric.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.JobCountMetric.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.RemoteBQPMonitor.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Settings.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.SwingBQPMonitor.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BQPStatusHandler.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Node.java=DEBUG
+
+# Special functionality: suppresses auto-deletion of PBS submit file
+log4j.logger.org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor=DEBUG
+log4j.logger.org.globus.cog.abstraction.impl.scheduler.pbs.PBSExecutor=DEBUG
+
+# CoG Karajan
+log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN
+log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN
+
+# CoG Scheduling
+log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=INFO
+
+# CoG Providers
+log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.rlog=INFO

Added: trunk/bin/grid/maketcfrominst
===================================================================
--- trunk/bin/grid/maketcfrominst	                        (rev 0)
+++ trunk/bin/grid/maketcfrominst	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,25 @@
+#! /bin/sh
+
+rundir=$1
+cd $rundir
+
+# Usage maketcfrominst run.NN
+#
+# run this from the directory in which the foreachsite that installed the app was run and which contains the run.NN directory
+
+# echo $(more */*out | egrep '^Runn|^\*' | grep matches | wc -l) successful application installs
+
+# more $rundir/*/*out | egrep 'submit.stdout|^instal|^data|^wn|^py'
+
+# more */*out | egrep 'submit.stdout|^instal'
+
+for site in $(find * -type d); do
+  # echo site=$site
+  if grep -q matches $site/*.stdout; then
+    # echo "    OK"
+    idir=$(grep '^installing in' $site/*.stdout | awk '{print $3}')
+    echo $site modftdock $idir null null null
+  else
+    : # echo "    failed"
+  fi
+done


Property changes on: trunk/bin/grid/maketcfrominst
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/mk_catalog.rb
===================================================================
--- trunk/bin/grid/mk_catalog.rb	                        (rev 0)
+++ trunk/bin/grid/mk_catalog.rb	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,245 @@
+#!/usr/bin/env ruby
+
+require 'erb'
+require 'ostruct'
+
+# starting ports for the templates
+coaster_service = 62100
+worker_service  = 61100
+
+hostname="communicado.ci.uchicago.edu"; 
+hostip="128.135.125.17";
+
+swift_workflow = %q[
+<% ctr = 0
+   sites.each_key do |name|
+     jm       = sites[name].jm
+     url      = sites[name].url
+     app_dir  = sites[name].app_dir
+     data_dir = sites[name].data_dir
+     throttle = sites[name].throttle %>
+app (external o) worker<%= ctr %>() {
+  worker<%= ctr %> "http://<%= hostip %>:<%= worker_service + ctr %>" "<%= name %>" "/tmp" "14400";
+}
+
+external rups<%= ctr %>[];
+int arr<%= ctr %>[];
+iterate i{
+  arr<%= ctr %>[i] = i;
+} until (i == <%= ((throttle * 100 + 2) * 2.5).to_i %>);
+
+foreach a,i in arr<%= ctr %> {
+  rups<%= ctr %>[i] = worker<%= ctr %>();
+}
+
+<%   ctr += 1
+   end %>
+]
+
+slave_workflow = %q[
+int t = 300;
+
+<% ctr = 0
+   sites.each_key do |name|
+     jm       = sites[name].jm
+     url      = sites[name].url
+     app_dir  = sites[name].app_dir
+     data_dir = sites[name].data_dir
+     throttle = sites[name].throttle %>
+app (external o) sleep<%= ctr %>(int time) {
+  sleep<%= ctr %> time;
+}
+
+external o<%=ctr%>;
+o<%=ctr%> = sleep<%=ctr%>(t);
+
+<%   ctr += 1
+   end %>
+
+]
+
+swift_tc = %q[
+<% ctr = 0
+   sites.each_key do |name|
+     jm       = sites[name].jm
+     url      = sites[name].url
+     app_dir  = sites[name].app_dir
+     data_dir = sites[name].data_dir
+     throttle = sites[name].throttle %>
+<%=name%>  worker<%= ctr %> <%=app_dir%>/worker.pl      INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="04:00:00"
+<%=name%>  sleep<%= ctr %>  /bin/sleep                  INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="00:05:00"
+<%=name%>  sleep            /bin/sleep                  INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="00:05:00"
+<%   ctr += 1
+   end %>
+]
+
+condor_sites = %q[
+<config>
+<% sites.each_key do |name| %>
+<%   jm       = sites[name].jm
+     url      = sites[name].url
+     app_dir  = sites[name].app_dir
+     data_dir = sites[name].data_dir
+     throttle = sites[name].throttle %>
+
+  <pool handle="<%=name%>">
+    <execution provider="condor" url="none"/>
+    <profile namespace="globus" key="jobType">grid</profile>
+    <profile namespace="globus" key="gridResource">gt2 <%=url%>/jobmanager-<%=jm%></profile>
+    <profile namespace="karajan" key="initialScore">200.0</profile>
+    <profile namespace="karajan" key="jobThrottle"><%=throttle%></profile>
+    <% if name =~ /FNAL_FERMIGRID/ %>
+      <profile namespace="globus" key="condor_requirements">GlueHostOperatingSystemRelease =?= "5.3" && GlueSubClusterName =!= GlueClusterName</profile>
+    <% end %>
+    <gridftp  url="gsiftp://<%=url%>"/>
+    <workdirectory><%=data_dir%>/swift_scratch</workdirectory>
+  </pool>
+<% end %>
+</config>
+]
+
+# GT2 for installing the workers
+gt2_sites = %q[
+<config>
+<% sites.each_key do |name| %>
+<%   jm       = sites[name].jm
+     url      = sites[name].url
+     app_dir  = sites[name].app_dir
+     data_dir = sites[name].data_dir
+     throttle = sites[name].throttle %>
+
+  <pool handle="<%=name%>">
+    <jobmanager universe="vanilla" url="<%=url%>/jobmanager-fork" major="2" />
+    <gridftp  url="gsiftp://<%=url%>"/>
+    <workdirectory><%= data_dir %>/swift_scratch</workdirectory>
+    <appdirectory><%= app_dir %></appdirectory>
+  </pool>
+<% end %>
+</config>
+]
+
+coaster_sites = %q[
+<config>
+<% ctr = 0
+   sites.each_key do |name|
+     jm       = sites[name].jm
+     url      = sites[name].url
+     app_dir  = sites[name].app_dir
+     data_dir = sites[name].data_dir
+     throttle = sites[name].throttle %>
+
+  <pool handle="<%=name%>">
+    <execution provider="coaster-persistent" url="https://<%= hostname %>:<%= coaster_service + ctr %>"
+        jobmanager="local:local" />
+
+    <profile namespace="globus" key="workerManager">passive</profile>
+
+    <profile namespace="karajan" key="initialScore">200.0</profile>
+    <profile namespace="karajan" key="jobThrottle"><%=throttle%></profile>
+
+    <gridftp  url="gsiftp://<%=url%>"/>
+    <workdirectory><%=data_dir%>/swift_scratch</workdirectory>
+  </pool>
+<%   ctr += 1
+   end %>
+</config>
+]
+
+def ress_query(class_ads)
+  cmd = "condor_status -pool engage-submit.renci.org"
+  class_ads[0..-2].each do |class_ad|
+    cmd << " -format \"%s|\" #{class_ad}"
+  end
+  cmd << " -format \"%s\\n\" #{class_ads[-1]}"
+  `#{cmd}`
+end
+
+def ress_parse(app_name)
+  dir_suffix = "/engage/#{app_name}"
+  class_ads  = [
+    "GlueSiteUniqueID", "GlueCEInfoHostName", "GlueCEInfoJobManager",
+    "GlueCEInfoGatekeeperPort", "GlueCEInfoApplicationDir", "GlueCEInfoDataDir",
+    "GlueCEInfoTotalCPUs"
+  ]
+  ress_query(class_ads).each_line do |line|
+    line.chomp!
+#puts "ress_query: line is:"
+#puts "$"<<line<<"$"
+#puts "---"
+    set = line.split("|")
+    next if not set.size > 0
+
+    value = OpenStruct.new
+
+    value.jm       = set[class_ads.index("GlueCEInfoJobManager")]
+    value.url      = set[class_ads.index("GlueCEInfoHostName")]
+    value.throttle = (set[class_ads.index("GlueCEInfoTotalCPUs")].to_f - 2.0) / 100.0
+    name           = set[class_ads.index("GlueSiteUniqueID")] + "__" +  value.url
+    value.name     = set[class_ads.index("GlueSiteUniqueID")]
+
+    value.app_dir = set[class_ads.index("GlueCEInfoApplicationDir")]
+    value.app_dir.sub!(/\/$/, "")
+    value.data_dir = set[class_ads.index("GlueCEInfoDataDir")]
+    value.data_dir.sub!(/\/$/, "")
+
+    value.app_dir += dir_suffix
+    value.data_dir += dir_suffix
+
+    # Hard-wired exceptions
+    value.app_dir  = "/osg/app"                     if name =~ /GridUNESP_CENTRAL/
+    value.data_dir = "/osg/data"                    if name =~ /GridUNESP_CENTRAL/
+    value.app_dir.sub!(dir_suffix, "/engage-#{app_name}")  if name =~ /BNL-ATLAS/
+    value.data_dir.sub!(dir_suffix, "/engage-#{app_name}") if name =~ /BNL-ATLAS/
+
+    yield name, value
+  end
+end
+
+if __FILE__ == $0 then
+  raise "No whitelist file" if !ARGV[0]
+
+  # Blacklist of non-working sites
+  blacklist = []
+  ARGV[1]   = "scec" if !ARGV[1]
+  whitelist = IO.readlines(ARGV[0]).map { |line| line.chomp! }
+
+  # Removes duplicate site entries (i.e. multilpe GRAM endpoints)
+  sites = {}
+  ress_parse(ARGV[1]) do |name, value|
+    next if blacklist.index(name) and not blacklist.empty?
+    next if not whitelist.index(name) and not whitelist.empty?
+    sites[name] = value if sites[name] == nil
+  end
+
+  condor_out = File.open("condor_osg.xml", "w")
+  gt2_out = File.open("gt2_osg.xml", "w")
+  coaster_out = File.open("coaster_osg.xml", "w")
+
+  tc_out     = File.open("tc.data", "w")
+  workflow_out = File.open("worker.swift", "w")
+  slave_out = File.open("slave.swift", "w")
+
+  condor = ERB.new(condor_sites, 0, "%<>")
+  gt2 = ERB.new(gt2_sites, 0, "%<>")
+  coaster = ERB.new(coaster_sites, 0, "%<>")
+
+  tc     = ERB.new(swift_tc, 0, "%<>")
+  workflow = ERB.new(swift_workflow, 0, "%<>")
+  slave = ERB.new(slave_workflow, 0, "%<>")
+
+  condor_out.puts condor.result(binding)
+  gt2_out.puts gt2.result(binding)
+  coaster_out.puts coaster.result(binding)
+
+  tc_out.puts tc.result(binding)
+  workflow_out.puts workflow.result(binding)
+  slave_out.puts slave.result(binding)
+
+  condor_out.close
+  gt2_out.close
+  coaster_out.close
+
+  tc_out.close
+  workflow_out.close
+  slave_out.close
+end


Property changes on: trunk/bin/grid/mk_catalog.rb
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/mk_cats.rb
===================================================================
--- trunk/bin/grid/mk_cats.rb	                        (rev 0)
+++ trunk/bin/grid/mk_cats.rb	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,149 @@
+#!/usr/bin/env ruby
+
+require 'erb'
+require 'ostruct'
+
+# ports lists for the templates
+
+#coaster_service[] = `cat service-*.sport;`
+#worker_service[]  = `cat service-*.wport`;
+
+coaster_service = 12345;
+worker_service = 67890;
+
+hostname="communicado.ci.uchicago.edu"; 
+hostip="128.135.125.17";
+
+coaster_sites = %q[
+<config>
+<% ctr = 0
+   sites.each_key do |name|
+     jm       = sites[name].jm
+     url      = sites[name].url
+     app_dir  = sites[name].app_dir
+     data_dir = sites[name].data_dir
+     throttle = sites[name].throttle %>
+
+  <pool handle="<%=name%>">
+    <execution provider="coaster-persistent" url="https://<%= hostname %>:<%= coaster_service + ctr %>"
+        jobmanager="local:local" />
+
+    <profile namespace="globus" key="workerManager">passive</profile>
+
+    <profile namespace="karajan" key="initialScore">200.0</profile>
+    <profile namespace="karajan" key="jobThrottle"><%=throttle%></profile>
+
+    <gridftp  url="gsiftp://<%=url%>"/>
+    <workdirectory><%=data_dir%>/swift_scratch</workdirectory>
+  </pool>
+<%   ctr += 1
+   end %>
+</config>
+]
+
+def OLDress_query(class_ads)
+  cmd = "condor_status -pool engage-submit.renci.org"
+  class_ads[0..-2].each do |class_ad|
+    cmd << " -format \"%s|\" #{class_ad}"
+  end
+  cmd << " -format \"%s\\n\" #{class_ads[-1]}"
+  `#{cmd}`
+end
+
+def ress_query(class_ads)
+  cmd = "./ressfields.sh"
+  class_ads[0..-1].each do |class_ad|
+    cmd << " #{class_ad}"
+  end
+  `#{cmd}`
+end
+
+def ress_parse(app_name)
+  dir_suffix = "/engage/#{app_name}"
+  class_ads  = [
+    "GlueSiteUniqueID", "GlueCEInfoHostName", "GlueCEInfoJobManager",
+    "GlueCEInfoGatekeeperPort", "GlueCEInfoApplicationDir", "GlueCEInfoDataDir",
+    "GlueCEInfoTotalCPUs"
+  ]
+  ress_query(class_ads).each_line do |line|
+    line.chomp!
+puts "ress_query: line is:"
+puts "$"<<line<<"$"
+puts "---"
+    set = line.split("|")
+    next if not set.size > 0
+
+    value = OpenStruct.new
+
+    value.jm       = set[class_ads.index("GlueCEInfoJobManager")]
+    value.url      = set[class_ads.index("GlueCEInfoHostName")]
+    value.throttle = (set[class_ads.index("GlueCEInfoTotalCPUs")].to_f - 2.0) / 100.0
+    name           = set[class_ads.index("GlueSiteUniqueID")] + "__" +  value.url
+    value.name     = set[class_ads.index("GlueSiteUniqueID")]
+
+    value.app_dir = set[class_ads.index("GlueCEInfoApplicationDir")]
+    value.app_dir.sub!(/\/$/, "")
+    value.data_dir = set[class_ads.index("GlueCEInfoDataDir")]
+    value.data_dir.sub!(/\/$/, "")
+
+    value.app_dir += dir_suffix
+    value.data_dir += dir_suffix
+
+    # Hard-wired exceptions
+    value.app_dir  = "/osg/app"                     if name =~ /GridUNESP_CENTRAL/
+    value.data_dir = "/osg/data"                    if name =~ /GridUNESP_CENTRAL/
+    value.app_dir.sub!(dir_suffix, "/engage-#{app_name}")  if name =~ /BNL-ATLAS/
+    value.data_dir.sub!(dir_suffix, "/engage-#{app_name}") if name =~ /BNL-ATLAS/
+
+    yield name, value
+  end
+end
+
+if __FILE__ == $0 then
+  raise "No whitelist file" if !ARGV[0]
+
+  # Blacklist of non-working sites
+  blacklist = []
+  ARGV[1]   = "scec" if !ARGV[1]
+  whitelist = IO.readlines(ARGV[0]).map { |line| line.chomp! }
+
+  # Removes duplicate site entries (i.e. multilpe GRAM endpoints)
+  sites = {}
+  ress_parse(ARGV[1]) do |name, value|
+    next if blacklist.index(name) and not blacklist.empty?
+    next if not whitelist.index(name) and not whitelist.empty?
+    sites[name] = value if sites[name] == nil
+  end
+
+  # condor_out = File.open("condor_osg.xml", "w")
+  # gt2_out = File.open("gt2_osg.xml", "w")
+  coaster_out = File.open("coaster_osg.xml", "w")
+
+  # tc_out     = File.open("tc.data", "w")
+  # workflow_out = File.open("worker.swift", "w")
+  # slave_out = File.open("slave.swift", "w")
+
+  # condor = ERB.new(condor_sites, 0, "%<>")
+  # gt2 = ERB.new(gt2_sites, 0, "%<>")
+  coaster = ERB.new(coaster_sites, 0, "%<>")
+
+  # tc     = ERB.new(swift_tc, 0, "%<>")
+  # workflow = ERB.new(swift_workflow, 0, "%<>")
+  # slave = ERB.new(slave_workflow, 0, "%<>")
+
+  # condor_out.puts condor.result(binding)
+  # gt2_out.puts gt2.result(binding)
+  coaster_out.puts coaster.result(binding)
+
+  # tc_out.puts tc.result(binding)
+  # workflow_out.puts workflow.result(binding)
+  # slave_out.puts slave.result(binding)
+
+  # condor_out.close
+  # gt2_out.close
+  coaster_out.close
+
+  # tc_out.close
+  # workflow_out.close
+  # slave_out.close
+end


Property changes on: trunk/bin/grid/mk_cats.rb
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/mk_osg_sitetest.rb
===================================================================
--- trunk/bin/grid/mk_osg_sitetest.rb	                        (rev 0)
+++ trunk/bin/grid/mk_osg_sitetest.rb	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,116 @@
+#!/usr/bin/env ruby
+
+# File: mk_test.rb
+# Date: 2010-10-06
+# Author: Allan Espinosa
+# Email: aespinosa at cs.uchicago.edu
+# Description: A Swift workflow generator to test OSG sites through the Engage
+#              VO. Generates the accompanying tc.data and sites.xml as well.
+#              Run with "swift -sites.file sites.xml -tc.file tc.data
+#              test_osg.swift"
+
+require 'erb'
+require 'ostruct'
+require 'ress'
+
+swift_tc = %q[
+localhost            echo   /bin/echo                                       INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="00:05:00"
+<% ctr = 0
+   sites.each_key do |name| %>
+<%   jm       = sites[name].jm
+     url      = sites[name].url
+     app_dir  = sites[name].app_dir
+     data_dir = sites[name].data_dir
+     throttle = sites[name].throttle %>
+<%=name%>  cat<%=ctr%>    /bin/cat      INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="00:01:30"
+<%   ctr += 1
+   end %>
+]
+
+swift_workflow = %q[
+type file;
+
+app (file t) echo(string i) {
+  echo i stdout=@filename(t);
+}
+
+<% ctr = 0
+   sites.each_key do |name| %>
+app (file t) cat<%= ctr %>(file input ) { 
+  cat<%= ctr %> @filename(input) stdout=@filename(t);
+}
+<%   ctr += 1
+   end %>
+
+<% ctr = 0
+   sites.each_key do |name| %>
+file input<%= ctr %><"cat<%= ctr %>.in">;
+input<%= ctr %> = echo("<%= name %>");
+file out<%= ctr %><"cat<%= ctr %>.out">;
+out<%= ctr %> = cat<%= ctr %>(input<%= ctr %>);
+<%   ctr += 1
+   end %>
+
+]
+
+condor_sites = %q[
+<config>
+  <pool handle="localhost">
+    <filesystem provider="local" />
+    <execution provider="local" />
+    <workdirectory >/var/tmp</workdirectory>
+    <profile namespace="karajan" key="jobThrottle">0</profile>
+  </pool>
+<% sites.each_key do |name| %>
+<%   jm       = sites[name].jm
+     url      = sites[name].url
+     app_dir  = sites[name].app_dir
+     data_dir = sites[name].data_dir
+     throttle = sites[name].throttle %>
+
+  <pool handle="<%=name%>">
+    <execution provider="condor" url="none"/>
+
+    <profile namespace="globus" key="jobType">grid</profile>
+    <profile namespace="globus" key="gridResource">gt2 <%=url%>/jobmanager-<%=jm%></profile>
+
+    <profile namespace="karajan" key="initialScore">20.0</profile>
+    <profile namespace="karajan" key="jobThrottle"><%=throttle%></profile>
+
+    <gridftp  url="gsiftp://<%=url%>"/>
+    <workdirectory><%=data_dir%>/swift_scratch</workdirectory>
+  </pool>
+<% end %>
+</config>
+]
+
+# Redlist of non-working sites
+redlist = [ ]
+
+puts("mk_test starting")
+
+# Removes duplicate site entries (i.e. multilpe GRAM endpoints)
+sites = {}
+ress_parse do |name, value|
+  next if redlist.index(name)
+  sites[name] = value if sites[name] == nil
+print("site: ")
+puts(name)
+#puts(name,value)
+end
+
+condor_out = File.open("sites.xml", "w")
+tc_out     = File.open("tc.data", "w")
+swift_out = File.open("test_osg.swift", "w")
+
+condor = ERB.new(condor_sites, 0, "%<>")
+tc     = ERB.new(swift_tc, 3, "%<>")
+swift     = ERB.new(swift_workflow, 0, "%<>")
+
+condor_out.puts condor.result(binding)
+tc_out.puts tc.result(binding)
+swift_out.puts swift.result(binding)
+
+condor_out.close
+tc_out.close
+swift_out.close


Property changes on: trunk/bin/grid/mk_osg_sitetest.rb
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/osgcat
===================================================================
--- trunk/bin/grid/osgcat	                        (rev 0)
+++ trunk/bin/grid/osgcat	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,170 @@
+#!/usr/bin/perl
+
+use strict;
+
+use Pod::Usage;
+use Getopt::Long;
+use File::Temp qw/ tempfile tempdir mktemp /;
+
+my $opt_help = 0;
+my $opt_vo = 'engage';
+my $opt_engage_verified = 0;
+my $opt_gt4 = 0;
+my $opt_condorg = 0;
+my $opt_out = '&STDOUT';
+
+Getopt::Long::Configure('bundling');
+GetOptions(
+    "help"                   => \$opt_help,
+    "vo=s"                   => \$opt_vo,
+    "engage-verified"        => \$opt_engage_verified,
+    "gt4"                    => \$opt_gt4,
+    "condor-g"               => \$opt_condorg,
+    "out=s"                  => \$opt_out,
+) or pod2usage(1);
+
+if ($opt_help) {
+    pod2usage(1);
+}
+
+if ($opt_engage_verified && $opt_vo ne "engage") {
+    die("You can not specify a vo when using --engage-verified\n");
+}
+
+# make sure condor_status is in the path
+my $out = `which condor_status 2>/dev/null`;
+if ($out eq "") {
+    die("This tool depends on condor_status.\n" .
+        "Please make sure condor_status in your path.\n");
+}
+
+my %ads;
+my %tmp;
+my $cmd = "condor_status -any -long -constraint" .
+          " 'StringlistIMember(\"VO:$opt_vo\";GlueCEAccessControlBaseRule)'" .
+          " -pool osg-ress-1.fnal.gov";
+# if we want the engage verified sites, ignore opt_vo and query against 
+# engage central collector
+if ($opt_engage_verified) {
+    $cmd = "condor_status -any -long -constraint" .
+           " 'SiteVerified==TRUE'" .
+           " -pool engage-central.renci.org"
+}
+open(STATUS, "$cmd|");
+while(<STATUS>) {
+    chomp;
+    if ($_ eq "") {
+        if ($tmp{'GlueSiteName'} ne "") {
+            my %copy = %tmp;
+            $ads{$tmp{'GlueSiteName'} . "_" . $tmp{'GlueClusterUniqueID'}} = \%copy;
+            undef %tmp;
+        }
+    }
+    else {
+        my ($key, $value) = split(/ = /, $_, 2);
+        $value =~ s/^"|"$//g; # remove quotes from Condor strings
+        $tmp{$key} = $value;
+    }
+}
+close(STATUS);
+
+# lowercase vo
+my $lc_vo = lc($opt_vo);
+
+open(FH, ">$opt_out") or die("Unable to open $opt_out");
+print FH "<config>\n";
+foreach my $siteid (sort keys %ads) {
+    my $contact = $ads{$siteid}->{'GlueCEInfoContactString'};
+    my $host = $contact;
+    $host =~ s/[:\/].*//;
+    my $jm = $contact;
+    $jm =~ s/.*jobmanager-//;
+    if ($jm eq "pbs") {
+        $jm = "PBS";
+    }
+    elsif ($jm eq "lsf") {
+        $jm = "LSF";
+    }
+    elsif ($jm eq "sge") {
+        $jm = "SGE";
+    }
+    elsif ($jm eq "condor") {
+        $jm = "Condor";
+    }
+    my $workdir = $ads{$siteid}->{'GlueCEInfoDataDir'};
+    print FH "\n";
+    print FH "  <!-- $siteid -->\n";
+    print FH "  <pool handle=\"$siteid\" >\n";
+    print FH "    <gridftp  url=\"gsiftp://$host/\" />\n";
+    if ($opt_condorg) {
+        print FH "    <execution provider=\"condor\" />\n";
+        print FH "    <profile namespace=\"globus\" key=\"jobType\">grid</profile>\n";
+        if($opt_gt4) {
+            die("swift-osg-ress-site-catalog cannot generate Condor-G + GRAM4 sites files");
+        }
+        print FH "    <profile namespace=\"globus\" key=\"gridResource\">gt2 $contact</profile>\n";
+    }
+    elsif ($opt_gt4) {
+        print FH "    <execution provider=\"gt4\" jobmanager=\"$jm\" url=\"$host:9443\" />\n";
+    }
+    else {
+        print FH "    <jobmanager universe=\"vanilla\" url=\"$contact\" major=\"2\" />\n";
+    }
+    print FH "    <workdirectory >$workdir/$lc_vo/tmp/$host</workdirectory>\n";
+    print FH "  </pool>\n";
+}
+print FH "\n</config>\n";
+close(FH);
+
+exit(0);
+
+__END__
+
+=head1 NAME
+
+swift-osg-ress-site-catalog - converts ReSS data to Swift site catalog
+
+=head1 SYNOPSIS
+
+swift-osg-ress-site-catalog [options]
+
+=head1 OPTIONS
+
+=over 8
+
+=item B<--help>
+
+Show this help message
+
+=item B<--vo=[name]>
+
+Set what VO to query ReSS for
+
+=item B<--engage-verified>
+
+Only retrieve sites verified by the Engagement VO site verification tests
+This can not be used together with --vo, as the query will only work for
+sites advertising support for the Engagement VO.
+
+This option means information will be retrieved from the Engagement collector
+instead of the top-level ReSS collector.
+
+=item B<--out=[filename]>
+
+Write to [filename] instead of stdout
+
+=item B<--condor-g>
+
+Generates sites files which will submit jobs using a local Condor-G
+installation rather than through direct GRAM2 submission.
+
+=back
+
+=head1 DESCRIPTION
+
+B<swift-osg-ress-site-catalog> converts ReSS data to Swift site catalog
+
+=cut
+
+
+


Property changes on: trunk/bin/grid/osgcat
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/ress.rb
===================================================================
--- trunk/bin/grid/ress.rb	                        (rev 0)
+++ trunk/bin/grid/ress.rb	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,55 @@
+require 'ostruct'
+
+def ress_query(class_ads)
+  cmd = "condor_status -pool engage-submit.renci.org"
+  class_ads[0..-2].each do |class_ad|
+    cmd << " -format \"%s|\" #{class_ad}"
+  end
+  cmd << " -format \"%s\\n\" #{class_ads[-1]}"
+  `#{cmd}`
+end
+
+def ress_parse
+  dir_suffix = "/engage/swift"
+  class_ads  = [
+    "GlueSiteUniqueID", "GlueCEInfoHostName", "GlueCEInfoJobManager",
+    "GlueCEInfoGatekeeperPort", "GlueCEInfoApplicationDir", "GlueCEInfoDataDir",
+    "GlueCEInfoTotalCPUs"
+  ]
+  ress_query(class_ads).each_line do |line|
+    line.chomp!
+    set = line.split("|")
+    next if not set.size > 0
+
+    value = OpenStruct.new
+
+    value.jm       = set[class_ads.index("GlueCEInfoJobManager")]
+    value.url      = set[class_ads.index("GlueCEInfoHostName")]
+    value.throttle = (set[class_ads.index("GlueCEInfoTotalCPUs")].to_f - 2.0) / 100.0
+    name           = set[class_ads.index("GlueSiteUniqueID")] + "__" +  value.url
+    value.name     = set[class_ads.index("GlueSiteUniqueID")]
+
+    value.app_dir = set[class_ads.index("GlueCEInfoApplicationDir")]
+    value.app_dir.sub!(/\/$/, "")
+    value.data_dir = set[class_ads.index("GlueCEInfoDataDir")]
+    value.data_dir.sub!(/\/$/, "")
+
+    value.app_dir = "/osg/app" if name =~ /GridUNESP_CENTRAL/
+    value.data_dir = "/osg/data" if name =~ /GridUNESP_CENTRAL/
+
+    if name =~ /BNL-ATLAS/
+      value.app_dir += "/engage-scec"
+      value.data_dir += "/engage-scec"
+    #elsif name == "LIGO_UWM_NEMO" or name == "SMU_PHY" or name == "UFlorida-HPC" or name == "RENCI-Engagement" or name == "RENCI-Blueridge"
+      #value.app_dir += "/osg/scec"
+      #value.data_dir += "/osg/scec"
+    else
+      value.app_dir += dir_suffix
+      value.data_dir += dir_suffix
+    end
+
+    yield name, value
+  end
+end
+
+


Property changes on: trunk/bin/grid/ress.rb
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/ressfields
===================================================================
--- trunk/bin/grid/ressfields	                        (rev 0)
+++ trunk/bin/grid/ressfields	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,8 @@
+#! /bin/sh
+for f in $*; do
+  flist=$flist" -format %s| "$f
+done
+
+echo flist: $flist >>ressfields.log
+
+condor_status -pool engage-submit.renci.org $flist -format "\\n" ""
\ No newline at end of file


Property changes on: trunk/bin/grid/ressfields
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/run_workers
===================================================================
--- trunk/bin/grid/run_workers	                        (rev 0)
+++ trunk/bin/grid/run_workers	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,16 @@
+#! /bin/bash
+
+bin=$(cd $(dirname $0); pwd)
+
+work=$(mktemp -d run_workers.run.XXX)
+if [ $? != 0 ]; then
+  echo $0: failed to create work directory
+  exit 1
+fi
+echo $0: working directory is $work
+cd $work
+
+ruby -I $bin $bin/nqueued.rb ../greensites mwildeT1
+
+# To Do: 
+# manage a total running worker pool based on demand from swift + slack


Property changes on: trunk/bin/grid/run_workers
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/sites
===================================================================
--- trunk/bin/grid/sites	                        (rev 0)
+++ trunk/bin/grid/sites	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,8 @@
+
+swift-osg-ress-site-catalog --vo=engage  --condor-g | # >osg.xml
+
+for s in $(grep gridRes | sed -e 's/^.* //' -e 's/<.*//' ); do
+  ( sname=$(echo $s | sed -e 's,/.*,,')
+    echo site: $sname
+  )
+done


Property changes on: trunk/bin/grid/sites
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/start-ranger-service
===================================================================
--- trunk/bin/grid/start-ranger-service	                        (rev 0)
+++ trunk/bin/grid/start-ranger-service	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,70 @@
+#! /bin/bash
+
+# FIXME: make these commandline keyword arguments, eg --nodes=
+
+NODES=${1:-1}
+WALLTIME=${2:-00:10:00}
+PROJECT=${3:-TG-DBS080004N}
+QUEUE=${4:-development}
+REMOTE_USER=${5:-tg455797}
+
+STARTSERVICE=true
+HOST=tg-login.ranger.tacc.teragrid.org
+BIN=$(cd $(dirname $0); pwd)
+
+echo NODES=$NODES WALLTIME=$WALLTIME PROJECT=$PROJECT REMOTE_USER=$REMOTE_USER
+LOGLEVEL=INFO # INFO, DEBUG, TRACE for increasing detail
+
+CORESPERNODE=16
+
+THROTTLE=$(echo "scale=2; ($NODES*$CORESPERNODE)/100 -.01"|bc)
+
+echo THROTTLE=$THROTTLE
+
+# This lets user run this script to add another job full of workers to an existing coaster service
+# Must be started in the same directory where start-swift-service created the service.wports file.
+
+if [ $STARTSERVICE = true ]; then
+  start-swift-service 1 &
+  sleep 5
+  SPORT=$(cat service.sports)
+  cat >sites.pecos.xml <<EOF
+
+  <config>
+    <pool handle="localhost">
+      <execution provider="coaster-persistent" url="http://localhost:$SPORT" jobmanager="local:local"/>
+      <profile namespace="globus" key="workerManager">passive</profile>
+      <profile namespace="globus" key="jobsPerNode">$CORESPERNODE</profile>
+      <profile key="jobThrottle" namespace="karajan">$THROTTLE</profile>
+      <profile namespace="karajan" key="initialScore">10000</profile>
+      <!-- <filesystem provider="local" url="none" /> -->
+      <profile namespace="swift" key="stagingMethod">proxy</profile>
+      <workdirectory>/tmp/wilde</workdirectory>
+    </pool>
+  </config>
+EOF
+fi
+
+WPORT=$(cat service.wports)
+SERVICE_URL=http://$(hostname -f):$WPORT
+echo swift service started - SPORT=$(cat service.sports) WPORT=$WPORT SERVICE_URL=$SERVICE_URL
+
+# FIXME: scp the right worker.pl, worker.sh and .sub files to the dest system (Ranger)
+
+rdir=swift_gridtools
+ssh $REMOTE_USER@$HOST mkdir -p $rdir
+
+if [ $? != 0 ]; then
+  echo $0: unable to create remote directory $rdir
+  exit 1
+fi
+
+echo Created remote dir
+
+scp $BIN/{worker.pl,workers.ranger.sh,workers.ranger.sub} $REMOTE_USER@$HOST:$rdir
+
+echo Copied grid tools to remote dir
+
+ssh $REMOTE_USER@$HOST qsub -A $PROJECT -N runworkers -pe 16way $(($NODES * 16)) -l h_rt=$WALLTIME -q $QUEUE -v SERVICE_URL=$SERVICE_URL,WORKER_LOGLEVEL=$LOGLEVEL $rdir/workers.ranger.sub
+
+echo Submitted remote worker launching script


Property changes on: trunk/bin/grid/start-ranger-service
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/start-ranger-service~
===================================================================
--- trunk/bin/grid/start-ranger-service~	                        (rev 0)
+++ trunk/bin/grid/start-ranger-service~	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,72 @@
+#! /bin/bash
+
+# FIXME: make these commandline keyword arguments, eg --nodes=
+
+NODES=${1:-1}
+WALLTIME=${2:-00:10:00}
+PROJECT=${3:-TG-DBS080004N}
+QUEUE=${4:-development}
+REMOTE_USER=${5:-tg455797}
+
+STARTSERVICE=true
+HOST=tg-login.ranger.tacc.teragrid.org
+BIN=$(cd $(dirname $0); pwd)
+
+echo NODES=$NODES WALLTIME=$WALLTIME PROJECT=$PROJECT REMOTE_USER=$REMOTE_USER
+LOGLEVEL=INFO # INFO, DEBUG, TRACE for increasing detail
+
+CORESPERNODE=16
+
+THROTTLE=$(echo "scale=2; ($NODES*$CORESPERNODE)/100 -.01"|bc)
+
+echo THROTTLE=$THROTTLE
+
+exit
+
+# This lets user run this script to add another job full of workers to an existing coaster service
+# Must be started in the same directory where start-swift-service created the service.wports file.
+
+if [ $STARTSERVICE = true ]; then
+  start-swift-service 1 &
+  sleep 5
+  SPORT=$(cat service.sports)
+  cat >sites.pecos.xml <<EOF
+
+  <config>
+    <pool handle="localhost">
+      <execution provider="coaster-persistent" url="http://localhost:$SPORT" jobmanager="local:local"/>
+      <profile namespace="globus" key="workerManager">passive</profile>
+      <profile namespace="globus" key="jobsPerNode">$CORESPERNODE</profile>
+      <profile key="jobThrottle" namespace="karajan">$THROTTLE</profile>
+      <profile namespace="karajan" key="initialScore">10000</profile>
+      <!-- <filesystem provider="local" url="none" /> -->
+      <profile namespace="swift" key="stagingMethod">proxy</profile>
+      <workdirectory>/tmp/wilde</workdirectory>
+    </pool>
+  </config>
+EOF
+fi
+
+WPORT=$(cat service.wports)
+SERVICE_URL=http://$(hostname -f):$WPORT
+echo swift service started - SPORT=$(cat service.sports) WPORT=$WPORT SERVICE_URL=$SERVICE_URL
+
+# FIXME: scp the right worker.pl, worker.sh and .sub files to the dest system (Ranger)
+
+rdir=swift_gridtools
+ssh $REMOTE_USER@$HOST mkdir -p $rdir
+
+if [ $? != 0 ]; then
+  echo $0: unable to create remote directory $rdir
+  exit 1
+fi
+
+echo Created remote dir
+
+scp $BIN/{worker.pl,workers.ranger.sh,workers.ranger.sub} $REMOTE_USER@$HOST:$rdir
+
+echo Copied grid tools to remote dir
+
+ssh $REMOTE_USER@$HOST qsub -A $PROJECT -N runworkers -pe 16way $(($NODES * 16)) -l h_rt=$WALLTIME -q $QUEUE -v SERVICE_URL=$SERVICE_URL,WORKER_LOGLEVEL=$LOGLEVEL $rdir/workers.ranger.sub
+
+echo Submitted remote worker launching script


Property changes on: trunk/bin/grid/start-ranger-service~
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/start-swift-service
===================================================================
--- trunk/bin/grid/start-swift-service	                        (rev 0)
+++ trunk/bin/grid/start-swift-service	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+NSERVICES=$1
+SERVICE=coaster-service  # found via PATH
+
+ontrap()  # FIXME: Not needed?
+{
+  echo '====>' in ontrap
+  trap - 1 2 3 15
+  echo start_service: trapping exit or signal
+  kill $(cat service-*.pid)
+}
+
+# trap ontrap 1 2 3 15  # FIXME: Not needed?
+
+rm -f service.sports service.wports
+for i in `seq -w 0 $((NSERVICES - 1))`; do
+  rm -f service-$i.{sport,wport,pid,log}
+  $SERVICE -nosec -passive -portfile service-$i.sport -localportfile service-$i.wport &> service-$i.log  &
+  echo $! >service-$i.pid
+  sleep 3
+  if [ -s service-$i.sport ]; then
+    echo $(cat service-$i.sport) >> service.sports
+  else
+    echo service-$i.sport does not exist or is empty. exiting.
+    exit 1
+  fi
+  if [ -s service-$i.wport ]; then
+    echo $(cat service-$i.wport) >> service.wports
+  else
+    echo service-$i.wport does not exist or is empty. exiting.
+    exit 1
+  fi
+done
+
+wait


Property changes on: trunk/bin/grid/start-swift-service
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/swift-workers
===================================================================
--- trunk/bin/grid/swift-workers	                        (rev 0)
+++ trunk/bin/grid/swift-workers	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,222 @@
+#!/usr/bin/env ruby
+
+require 'mk_catalog'
+require 'etc'
+
+class Site
+  attr_accessor :grid_resource, :data_dir, :app_dir, :name, :port
+  attr_reader :submit_file
+
+#      executable = <%= @app_dir %>/worker.pl  # FIXME (below)
+
+#       transfer_executable = True
+
+#      executable = /home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/worker.pl
+#      arguments = http://128.135.125.17:<%= port %> <%= name %> /tmp 14400
+
+  def gen_submit(count = 1)
+    job = %q[
+      universe = grid
+      stream_output = False
+      stream_error = False
+      transfer_executable = false
+      periodic_remove = JobStatus == 5
+      notification = Never
+
+      globus_rsl = (maxwalltime=240)
+      grid_resource = <%= @grid_resource %>
+      executable = /bin/sleep
+      arguments = 300
+      log = condor.log
+
+      <% count.times { %>queue
+      <% } %>
+    ]
+
+    ERB.new(job.gsub(/^\s+/, ""), 0, "%<>", "@submit_file").result(binding)
+  end
+
+  def submit_job(count)
+#    puts "Submitting #{@name} #{count} jobs"
+    output = ""
+#return output
+    submitfile = gen_submit(count)
+    IO.popen("condor_submit", "w+") do |submit|
+      submit.puts submitfile
+      submit.close_write
+      output = submit.read
+    end
+    output
+  end
+
+  def queued
+    jobs = `condor_q  #{$username} -const 'GridResource == \"#{@grid_resource}\" && JobStatus == 1' -format \"%s \" GlobalJobId`
+    jobs.split(" ").size
+  end
+
+  def running
+    jobs = `condor_q #{$username} -const 'GridResource == \"#{@grid_resource}\" && JobStatus == 2' -format \"%s \" GlobalJobId`
+    jobs.split(" ").size
+  end
+
+end
+
+=begin
+# For reference:
+JobStatus in job ClassAds
+
+0	Unexpanded	U
+1	Idle	I
+2	Running	R
+3	Removed	X
+4	Completed	C
+5	Held	H
+6	Submission_err	E
+=end
+
+if __FILE__ == $0
+  raise "No whitelist file" if !ARGV[0]
+
+  start_port = 61100 # FIXME
+  ctr        = 0
+  threads    = []
+  ARGV[1]    = "scec" if !ARGV[1]
+  whitelist  = IO.readlines(ARGV[0]).map { |line| line.chomp! }
+  $username = Etc.getlogin
+
+  puts "Username = #{$username}"
+
+  minSiteJobs = 2
+  paddedDemand = 0
+  swiftDemand = 0
+  totalCores = 0
+  totalRunning = 0
+
+  ress_parse(ARGV[1]) do |name, value|
+    next if not whitelist.index(name) and not whitelist.empty?
+    totalCores += (value.throttle * 100 + 2).to_i
+  end
+  puts "totalCores for green sites = #{totalCores}"
+
+  demandThread = Thread.new("monitor-demand") do |t|
+    puts "starting demand thread"
+    while true do
+    puts "in demand thread"
+      # swiftDemand = IO.read("swiftDemand")  # Replace this with sensor of Swift demand
+      swiftDemand = 15
+      paddedDemand = (swiftDemand * 1.2).to_i
+      totalRunning = `condor_q #{$username} -const 'JobStatus == 2' -format \"%s \" GlobalJobId`.split(" ").size
+      puts "*** demandThread: swiftDemand=#{swiftDemand} paddedDemand=#{paddedDemand} totalRunning=#{totalRunning}"
+      sleep 60
+    end
+  end
+
+  ress_parse(ARGV[1]) do |name, value|
+    next if not whitelist.index(name) and not whitelist.empty?
+    site               = Site.new
+    site.name          = name
+    site.grid_resource = "gt2 #{value.url}/jobmanager-#{value.jm}"
+    site.app_dir       = value.app_dir
+    site.data_dir      = value.data_dir
+    site.port          = start_port + ctr
+
+    # local per-site attributes:
+
+    cores = (value.throttle * 100 + 2).to_i
+    siteFraction = cores.to_f / totalCores.to_f
+    siteTargetRunning = [ (swiftDemand.to_f * siteFraction), minSiteJobs ].max
+    siteTargetQueued = [ (swiftDemand.to_f * siteFraction), minSiteJobs ].max
+
+    printf "site: %5d cores %2d%% %s\n", cores, siteFraction * 100, name
+    targetQueued = 3
+
+    site.gen_submit
+
+    threads << Thread.new(name) do |job|
+      trip=0
+      while true do
+        if ( (swiftDemand) > totalRunning ) then
+          # demands > running: enforce N-queued algorithm
+          queued = site.queued
+          running = site.running
+          printf "trip %d site %s running %d queued %d\n", trip, name,running,queued
+          if (running+queued) == 0 then
+            newJobs = [ (paddedDemand * siteFraction).to_i, minSiteJobs ].max
+            printf "trip %d site %s empty - submitting %d (%d%% of demand %d)\n",
+              trip, name, newJobs, siteFraction * 100, paddedDemand
+            site.submit_job(newJobs)
+          elsif queued == 0 then
+            toRun = [ running * 1.2, [(paddedDemand * siteFraction).to_i, minSiteJobs ].max ].max
+            printf "trip %d site %s queued %d target %d has drained queue - submitting %d\n",
+              trip, name, queued, targetQueued, toRun
+            site.submit_job(toRun)
+          elsif queued < targetQueued
+            printf "trip %d site %s queued %d below target %d - submitting %d\n",
+              trip, name, queued, targetQueued, targetQueued-queued
+            site.submit_job(targetQueued - queued)
+          end
+          trip += 1
+          sleep 60
+          # puts "#{name}: #{total}"
+        end
+      end
+    end
+
+    ctr += 1
+  end
+end
+threads.each { |job| job.join }
+puts "All threads completed."
+
+# TODO:
+#
+# tag jobs for each run uniquely, and track them as a unique factory instance
+#
+
+=begin
+
+"Keep N Queued" Algorithm
+
+Goal:
+- monitor a running swift script to track its changing demand for cores
+- increase the # of running workers to meet the demand
+- let workers that are idle time out when supply is greater than demand
+
+Initially:
+- set a constant demand
+- determine #cores at each site
+
+initialPressure = 1.2  # increase demand 
+initialDemand = 50     # initial demand prior to first poll of Swift, to prime the worker pool ahead of Swift demand
+
+- set a constant target queued for each site based on ncores
+- set a target #running 
+
+THREAD 0:
+  demand = initialDemand
+  for each site
+    site.need = (site.cores/totalcores) * demand
+  sleep delay
+
+
+  while swiftScriptIsRunning
+    get demand
+    get #running
+
+
+  
+THREAD for each site
+  while swiftScriptIsRunning
+    get site.running
+    get set.queued  
+    need = demand - running
+    if need > 0
+      if running+queued = 0
+
+keep queued on each site:
+ max( expectation, 50% of observation )
+
+     toalc=1000 
+     sitec = 200 20% d=100 ex=20 q=20
+     r=50 q=25
+=end
\ No newline at end of file


Property changes on: trunk/bin/grid/swift-workers
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/worker.sh
===================================================================
--- trunk/bin/grid/worker.sh	                        (rev 0)
+++ trunk/bin/grid/worker.sh	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,14 @@
+#! /bin/bash
+
+SERVICEURL=$1
+NWORKERS=$2
+LOGLEVEL=$3
+
+# WORKER_LOGGING_LEVEL=$LOGLEVEL ./worker.pl http://128.135.125.17:$PORT swork01 ./workerlogs 
+
+LOGDIR=/tmp/$USER/workerlogs
+
+mkdir -p $LOGDIR
+for worker in $(seq -w 0 $(($NWORKERS-1))); do
+  echo WORKER_LOGGING_LEVEL=DEBUG $HOME/worker.pl $SERVICEURL swork${worker} $LOGDIR # >& /dev/null &
+done


Property changes on: trunk/bin/grid/worker.sh
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/workers.pbs.sub
===================================================================
--- trunk/bin/grid/workers.pbs.sub	                        (rev 0)
+++ trunk/bin/grid/workers.pbs.sub	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,36 @@
+#!/bin/bash
+#$ -N SGEtest
+#$ -pe 16way 256
+#$ -l h_rt=00:05:00
+#$ -o $HOME/swift/lab/t.stdout
+#$ -e $HOME/swift/lab/t.stderr
+#$ -v WORKER_LOGGING_LEVEL=NONE
+#$ -q development
+#$ -A TG-DBS080004N
+#$ -V
+#$ -S /bin/bash
+
+echo PE_HOSTFILE:
+echo
+cat $PE_HOSTFILE
+echo
+
+#cd / && NODES=`cat $PE_HOSTFILE | awk '{ for(i=0;i<$2;i++){print $1} }'`
+cd / && NODES=`cat $PE_HOSTFILE | awk '{print $1}'`
+ECF=/home/mwilde/.globus/scripts/t.exitcode
+INDEX=0
+for NODE in $NODES; do
+echo launch on node $NODE
+#  echo "N" >$ECF.$INDEX
+#  ssh $NODE /bin/bash -c \" "sleep 300; echo \\\$? > $ECF.$INDEX " \" &
+#  qrsh -nostdin -l hostname=$NODE hostname -f 2>&1 &
+  ssh $NODE hostname -f &
+   rc=$?
+   if [ $rc != 0 ]; then
+     echo ssh failed for $NODE
+   fi
+   #  sleep .33
+  INDEX=$((INDEX + 1))
+done
+
+wait

Added: trunk/bin/grid/workers.ranger.sh
===================================================================
--- trunk/bin/grid/workers.ranger.sh	                        (rev 0)
+++ trunk/bin/grid/workers.ranger.sh	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,18 @@
+#! /bin/bash
+
+SERVICEURL=$1
+NWORKERS=$2
+LOGLEVEL=$3
+
+# WORKER_LOGGING_LEVEL=$LOGLEVEL ./worker.pl http://128.135.125.17:$PORT swork01 ./workerlogs 
+
+LOGDIR=/tmp/$USER/workerlogs
+mkdir -p $LOGDIR
+cd $LOGDIR
+
+for worker in $(seq -w 0 $(($NWORKERS-1))); do
+  WORKER_LOGGING_LEVEL=$LOGLEVEL $HOME/swift_gridtools/worker.pl $SERVICEURL swork${worker} $LOGDIR >& /dev/null &
+done
+wait
+ls -lt $LOGDIR/
+tail $LOGDIR/*


Property changes on: trunk/bin/grid/workers.ranger.sh
___________________________________________________________________
Added: svn:executable
   + *

Added: trunk/bin/grid/workers.ranger.sub
===================================================================
--- trunk/bin/grid/workers.ranger.sub	                        (rev 0)
+++ trunk/bin/grid/workers.ranger.sub	2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,38 @@
+#!/bin/bash
+
+#$ -N runworkers
+#$ -pe 16way 16
+#$ -l h_rt=00:10:00
+#$ -o $HOME/
+#$ -e $HOME/
+#$ -q development
+#$ -V
+#$ -S /bin/bash
+
+# Must provide on commandline:
+
+#-- #$ -A TG-DBS080004N
+
+#export SERVICE_URL=http://missingServiceURL
+#export WORKER_LOGLEVEL=TRACE
+
+echo PE_HOSTFILE:
+echo
+cat $PE_HOSTFILE
+echo
+
+rdir=$HOME/swift_gridtools
+NODES=`cat $PE_HOSTFILE | awk '{print $1}'`
+
+INDEX=0
+for NODE in $NODES; do
+  ssh $NODE /bin/bash -c \" $rdir/workers.ranger.sh $SERVICE_URL 16 $WORKER_LOGLEVEL \" &
+  INDEX=$((INDEX + 1))
+done
+wait
+
+
+
+
+
+




More information about the Swift-commit mailing list