[Swift-commit] r4785 - in trunk/bin: . grid
wilde at ci.uchicago.edu
wilde at ci.uchicago.edu
Wed Jul 6 17:41:37 CDT 2011
Author: wilde
Date: 2011-07-06 17:41:37 -0500 (Wed, 06 Jul 2011)
New Revision: 4785
Added:
trunk/bin/grid/
trunk/bin/grid/1worker.sh
trunk/bin/grid/README
trunk/bin/grid/TODO
trunk/bin/grid/foreachsite
trunk/bin/grid/gen_gridsites
trunk/bin/grid/get_greensites
trunk/bin/grid/log4j.properties.debug
trunk/bin/grid/maketcfrominst
trunk/bin/grid/mk_catalog.rb
trunk/bin/grid/mk_cats.rb
trunk/bin/grid/mk_osg_sitetest.rb
trunk/bin/grid/osgcat
trunk/bin/grid/ress.rb
trunk/bin/grid/ressfields
trunk/bin/grid/run_workers
trunk/bin/grid/sites
trunk/bin/grid/start-ranger-service
trunk/bin/grid/start-ranger-service~
trunk/bin/grid/start-swift-service
trunk/bin/grid/swift-workers
trunk/bin/grid/worker.sh
trunk/bin/grid/workers.pbs.sub
trunk/bin/grid/workers.ranger.sh
trunk/bin/grid/workers.ranger.sub
Log:
Initial version of toolkit for OSG and TeraGrid
Added: trunk/bin/grid/1worker.sh
===================================================================
--- trunk/bin/grid/1worker.sh (rev 0)
+++ trunk/bin/grid/1worker.sh 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,2 @@
+PORT=$1
+WORKER_LOGGING_LEVEL=DEBUG ./worker.pl http://128.135.125.17:$PORT swork01 ./workerlogs
Property changes on: trunk/bin/grid/1worker.sh
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/README
===================================================================
--- trunk/bin/grid/README (rev 0)
+++ trunk/bin/grid/README 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,86 @@
+Coaster Pool generator
+======================
+
+Description: This set of scripts creates configuration files and workflows to request pilot
+ jobs to OSG.
+Author: Allan Espinosa
+Date: 2010 November 24
+
+
+>> 0.a setup OSG package
+
+$ source /opt/osg-1.2.16/setup.sh
+
+>> 0. create a voms proxy
+
+$ voms-proxy-init -voms Engage -valid 12:00
+
+
+Scripts
+-------
+
+1. start_services.sh - Starts coaster services
+ Usage: start_services.sh [number of services]
+
+2. mk_catalog.rb - Generates sites.xml files for submitting condor, coaster and
+ gram2 jobs to a list of OSG jobs. The whitelist is formatted as
+ [GlueSiteUniqueID]_[GlueCEInfoHostName] per site.
+ Usage: mk_catalog.rb [whitelist] [<optional: app_name>]
+
+3. nqueue.rb - Submits pilot coaster jobs to a list of sites by saturating it
+ queueing n-jobs at a time.
+ Usage: nqueue.rb [whitelist]
+
+
+Example usage
+-------------
+
+Here, an app called 'extenci' will be installed on the SPRACE site resource.
+
+1. Create a whitelist file. (use ../site_gen/gensites.sh )
+
+$ cat > whitelist << EOF
+SPRACE_osg-ce.sprace.org.br
+EOF
+
+
+2. Generate gram2 sites.xml (gt2_osg.xml) file.
+
+$ ./mk_catalog.rb whitelist extenci
+
+# Modify to use specified portlist (from start-services)
+# Note that if port changes you will need to regen...
+
+3. Upload worker.pl script to the site. The setup.k script also cleans up the
+
+# data directory of the site.
+#
+# $ swift setup.k # Modify this script - upload not needed, but cleanup still needed
+
+4. Spawn 2 coaster services. The first one is for PADS.
+
+$ ./start_services.sh 2
+
+# 2 is just for a test...actually need Nsites+1 services (+1 for PADS, +N for other fixed sites eg Beagle)
+
+5. Configure the service to run in passive mode. Any swift job that will use
+ coaster_osg.xml can be used aside from the slave.swift script.
+
+$ swift -config swift.properties -sites.file coaster_osg.xml slave.swift
+
+6. Request coaster jobs. The script will request (2.5 * total_cpus) pilot jobs
+ throughout the duration of the workflow
+
+ Method 1: via direct condor-g
+
+$ ./nqueue.rb whitelist
+
+ Method 2: via swift
+
+$ swift -config swift.properties -sites.file condor_osg.xml worker.swift
+
+
+7. Run your workflow. Here, the sleep.swift sample included in the package will
+ be used.
+
+$ swift -config swift.properties -sites.file coaster_osg.xml sleep.swift
Added: trunk/bin/grid/TODO
===================================================================
--- trunk/bin/grid/TODO (rev 0)
+++ trunk/bin/grid/TODO 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,40 @@
+
+
+ExTENCI Exec
+
+create modft install & test file; test under fork and work
+
+run sites gen
+
+run tcgen
+
+run factory (convert factory from ruby to shell?)
+
+run wf (one service for all w/ provider staging; one service per site)
+
+
+TO RESOLVE
+
+- how to set swift throttles to handle a varying number of coaster workers per site?
+
+- why did Allan set exceptions in workdir names, eg for BNL?
+
+- how to dynamically grow/shrink pool and add/remove sites; dynamically take coaster services in and out of service.
+
+- settings for retry and replication
+
+
+FEATURES
+
+Add site secltion option to foreachsite
+
+(Swift feature: foreach site in siwift?_)
+
+CLEANUP
+
+Find all interim p-baked tools under swift/lab/osg and place under grid/ for development
+
+Find Glen's tgsites command and integrate
+
+Merge in gstar
+
Added: trunk/bin/grid/foreachsite
===================================================================
--- trunk/bin/grid/foreachsite (rev 0)
+++ trunk/bin/grid/foreachsite 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,69 @@
+# Default settings:
+
+resource=work # fork or work(er)
+
+# FIXME: test for valid proxy
+
+# Usage: foreachsite [-resource fork|worker ] scriptname
+
+usage="$0 foreachsite [-resource fork|worker ] scriptname"
+
+# Process command line arguments
+
+while [ $# -gt 0 ]; do
+ case $1 in
+ -resource) resource=$2; shift 2 ;;
+ -*) echo $usage 1>&2
+ exit 1 ;;
+ *) scriptparam=$1
+ scriptpath=$(cd $(dirname $scriptparam); echo $(pwd)/$(basename $scriptparam)) ; shift 1 ;;
+ esac
+done
+
+if [ _$scriptparam = _ ]; then
+ echo $usage 1>&2
+ exit 1
+fi
+
+rundir=$(mktemp -d run.XX)
+cd $rundir
+
+echo Running foreachsite: resource=$resource script=$scriptparam rundir=$rundir
+
+# swift-osg-ress-site-catalog --engage-verified --condor-g >osg.xml
+swift-osg-ress-site-catalog --engage --condor-g >osg.xml
+
+for jobmanager in $(grep gridRes osg.xml | sed -e 's/^.* //' -e 's/<.*//' ); do
+ ( sitename=$(echo $jobmanager | sed -e 's,/.*,,') # strip off batch jobmanager /jobmanager, leaving site name
+ mkdir $sitename
+ cd $sitename
+ if [ $resource = fork ]; then
+ resname=$sitename
+ else
+ resname=$jobmanager
+ fi
+ cat >condor.sub <<END
+#jobmanger=$jobmanager
+#sitename=$sitename
+#resname=$resname
+universe = grid
+grid_resource = gt2 $resname
+stream_output = False
+stream_error = False
+Transfer_Executable = True
+output = $(pwd)/submit.stdout
+error = $(pwd)/submit.stderr
+log = $(pwd)/submit.log
+
+remote_initialdir = /tmp
+executable = $scriptpath
+#arguments = "-c" "python -V" # fails!
+notification = Never
+leave_in_queue = False
+queue
+
+END
+
+ condor_submit condor.sub
+ )
+done
Property changes on: trunk/bin/grid/foreachsite
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/gen_gridsites
===================================================================
--- trunk/bin/grid/gen_gridsites (rev 0)
+++ trunk/bin/grid/gen_gridsites 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,31 @@
+#! /bin/bash
+
+bin=$(cd $(dirname $0); pwd)
+
+work=$(mktemp -d gen_gridsites.run.XXX)
+if [ $? != 0 ]; then
+ echo $0: failed to create work directory
+ exit 1
+fi
+echo $0: working directory is $work
+cd $work
+
+ruby -I $bin $bin/mk_osg_sitetest.rb
+
+cat >cf <<END
+lazy.errors=true
+execution.retries=0
+status.mode=files
+use.provider.staging=false
+wrapperlog.always.transfer=false
+END
+
+swift -config cf -tc.file tc.data -sites.file sites.xml test_osg.swift >& swift.out &
+
+echo $0: Started test_osg.swift script - process id is $!
+
+# To Do:
+# harvest new sites with an updated goodsites file every few minutes
+# plot (print) a histogram of site aquisition over time
+# FIXME: move mk_test.rb to libexec/grid
+# check for or create valid proxy
Property changes on: trunk/bin/grid/gen_gridsites
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/get_greensites
===================================================================
--- trunk/bin/grid/get_greensites (rev 0)
+++ trunk/bin/grid/get_greensites 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,8 @@
+#! /bin/bash
+
+latestrun=$(ls -1td gen_gridsites.run.* | head -1)
+
+echo -e Green sites from $latestrun '\n' 1>&2
+
+cd $latestrun
+cat cat*.out
Property changes on: trunk/bin/grid/get_greensites
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/log4j.properties.debug
===================================================================
--- trunk/bin/grid/log4j.properties.debug (rev 0)
+++ trunk/bin/grid/log4j.properties.debug 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,69 @@
+# Set root category priority to WARN and its appenders to CONSOLE and FILE.
+log4j.rootCategory=INFO, CONSOLE, FILE
+#log4j.rootCategory=DEBUG, CONSOLE, FILE
+
+log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
+log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
+log4j.appender.CONSOLE.Threshold=INFO
+log4j.appender.CONSOLE.layout.ConversionPattern=%m%n
+
+log4j.appender.FILE=org.apache.log4j.FileAppender
+log4j.appender.FILE.File=swift.log
+log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
+log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n
+
+
+log4j.logger.org.apache.axis.utils=ERROR
+
+# Swift
+
+log4j.logger.swift=DEBUG
+log4j.logger.swift.textfiles=DEBUG
+log4j.logger.org.globus.swift.trace=INFO
+log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG
+log4j.logger.org.griphyn.vdl.karajan.functions.ProcessBulkErrors=WARN
+log4j.logger.org.griphyn.vdl.engine.Karajan=INFO
+log4j.logger.org.griphyn.vdl.karajan.lib=INFO
+log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG
+log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG
+log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG
+
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=INFO
+#ADDED:
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Node=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Settings=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BlockTask=INFO
+
+log4j.logger.org.globus.cog.abstraction.impl.execution.coaster.SubmitJobCommand.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.impl.execution.coaster.ServiceConfigurationCommand.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.ServiceConfigurationHandler.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.PassiveQueueProcessor.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BQPStatusCommand.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BlockTask.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.OverallocatedJobDurationMetric.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.JobCountMetric.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.RemoteBQPMonitor.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Settings.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.SwingBQPMonitor.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BQPStatusHandler.java=DEBUG
+log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Node.java=DEBUG
+
+# Special functionality: suppresses auto-deletion of PBS submit file
+log4j.logger.org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor=DEBUG
+log4j.logger.org.globus.cog.abstraction.impl.scheduler.pbs.PBSExecutor=DEBUG
+
+# CoG Karajan
+log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN
+log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN
+
+# CoG Scheduling
+log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=INFO
+
+# CoG Providers
+log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=INFO
+log4j.logger.org.globus.cog.abstraction.coaster.rlog=INFO
Added: trunk/bin/grid/maketcfrominst
===================================================================
--- trunk/bin/grid/maketcfrominst (rev 0)
+++ trunk/bin/grid/maketcfrominst 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,25 @@
+#! /bin/sh
+
+rundir=$1
+cd $rundir
+
+# Usage maketcfrominst run.NN
+#
+# run this from the directory in which the foreachsite that installed the app was run and which contains the run.NN directory
+
+# echo $(more */*out | egrep '^Runn|^\*' | grep matches | wc -l) successful application installs
+
+# more $rundir/*/*out | egrep 'submit.stdout|^instal|^data|^wn|^py'
+
+# more */*out | egrep 'submit.stdout|^instal'
+
+for site in $(find * -type d); do
+ # echo site=$site
+ if grep -q matches $site/*.stdout; then
+ # echo " OK"
+ idir=$(grep '^installing in' $site/*.stdout | awk '{print $3}')
+ echo $site modftdock $idir null null null
+ else
+ : # echo " failed"
+ fi
+done
Property changes on: trunk/bin/grid/maketcfrominst
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/mk_catalog.rb
===================================================================
--- trunk/bin/grid/mk_catalog.rb (rev 0)
+++ trunk/bin/grid/mk_catalog.rb 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,245 @@
+#!/usr/bin/env ruby
+
+require 'erb'
+require 'ostruct'
+
+# starting ports for the templates
+coaster_service = 62100
+worker_service = 61100
+
+hostname="communicado.ci.uchicago.edu";
+hostip="128.135.125.17";
+
+swift_workflow = %q[
+<% ctr = 0
+ sites.each_key do |name|
+ jm = sites[name].jm
+ url = sites[name].url
+ app_dir = sites[name].app_dir
+ data_dir = sites[name].data_dir
+ throttle = sites[name].throttle %>
+app (external o) worker<%= ctr %>() {
+ worker<%= ctr %> "http://<%= hostip %>:<%= worker_service + ctr %>" "<%= name %>" "/tmp" "14400";
+}
+
+external rups<%= ctr %>[];
+int arr<%= ctr %>[];
+iterate i{
+ arr<%= ctr %>[i] = i;
+} until (i == <%= ((throttle * 100 + 2) * 2.5).to_i %>);
+
+foreach a,i in arr<%= ctr %> {
+ rups<%= ctr %>[i] = worker<%= ctr %>();
+}
+
+<% ctr += 1
+ end %>
+]
+
+slave_workflow = %q[
+int t = 300;
+
+<% ctr = 0
+ sites.each_key do |name|
+ jm = sites[name].jm
+ url = sites[name].url
+ app_dir = sites[name].app_dir
+ data_dir = sites[name].data_dir
+ throttle = sites[name].throttle %>
+app (external o) sleep<%= ctr %>(int time) {
+ sleep<%= ctr %> time;
+}
+
+external o<%=ctr%>;
+o<%=ctr%> = sleep<%=ctr%>(t);
+
+<% ctr += 1
+ end %>
+
+]
+
+swift_tc = %q[
+<% ctr = 0
+ sites.each_key do |name|
+ jm = sites[name].jm
+ url = sites[name].url
+ app_dir = sites[name].app_dir
+ data_dir = sites[name].data_dir
+ throttle = sites[name].throttle %>
+<%=name%> worker<%= ctr %> <%=app_dir%>/worker.pl INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="04:00:00"
+<%=name%> sleep<%= ctr %> /bin/sleep INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="00:05:00"
+<%=name%> sleep /bin/sleep INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="00:05:00"
+<% ctr += 1
+ end %>
+]
+
+condor_sites = %q[
+<config>
+<% sites.each_key do |name| %>
+<% jm = sites[name].jm
+ url = sites[name].url
+ app_dir = sites[name].app_dir
+ data_dir = sites[name].data_dir
+ throttle = sites[name].throttle %>
+
+ <pool handle="<%=name%>">
+ <execution provider="condor" url="none"/>
+ <profile namespace="globus" key="jobType">grid</profile>
+ <profile namespace="globus" key="gridResource">gt2 <%=url%>/jobmanager-<%=jm%></profile>
+ <profile namespace="karajan" key="initialScore">200.0</profile>
+ <profile namespace="karajan" key="jobThrottle"><%=throttle%></profile>
+ <% if name =~ /FNAL_FERMIGRID/ %>
+ <profile namespace="globus" key="condor_requirements">GlueHostOperatingSystemRelease =?= "5.3" && GlueSubClusterName =!= GlueClusterName</profile>
+ <% end %>
+ <gridftp url="gsiftp://<%=url%>"/>
+ <workdirectory><%=data_dir%>/swift_scratch</workdirectory>
+ </pool>
+<% end %>
+</config>
+]
+
+# GT2 for installing the workers
+gt2_sites = %q[
+<config>
+<% sites.each_key do |name| %>
+<% jm = sites[name].jm
+ url = sites[name].url
+ app_dir = sites[name].app_dir
+ data_dir = sites[name].data_dir
+ throttle = sites[name].throttle %>
+
+ <pool handle="<%=name%>">
+ <jobmanager universe="vanilla" url="<%=url%>/jobmanager-fork" major="2" />
+ <gridftp url="gsiftp://<%=url%>"/>
+ <workdirectory><%= data_dir %>/swift_scratch</workdirectory>
+ <appdirectory><%= app_dir %></appdirectory>
+ </pool>
+<% end %>
+</config>
+]
+
+coaster_sites = %q[
+<config>
+<% ctr = 0
+ sites.each_key do |name|
+ jm = sites[name].jm
+ url = sites[name].url
+ app_dir = sites[name].app_dir
+ data_dir = sites[name].data_dir
+ throttle = sites[name].throttle %>
+
+ <pool handle="<%=name%>">
+ <execution provider="coaster-persistent" url="https://<%= hostname %>:<%= coaster_service + ctr %>"
+ jobmanager="local:local" />
+
+ <profile namespace="globus" key="workerManager">passive</profile>
+
+ <profile namespace="karajan" key="initialScore">200.0</profile>
+ <profile namespace="karajan" key="jobThrottle"><%=throttle%></profile>
+
+ <gridftp url="gsiftp://<%=url%>"/>
+ <workdirectory><%=data_dir%>/swift_scratch</workdirectory>
+ </pool>
+<% ctr += 1
+ end %>
+</config>
+]
+
+def ress_query(class_ads)
+ cmd = "condor_status -pool engage-submit.renci.org"
+ class_ads[0..-2].each do |class_ad|
+ cmd << " -format \"%s|\" #{class_ad}"
+ end
+ cmd << " -format \"%s\\n\" #{class_ads[-1]}"
+ `#{cmd}`
+end
+
+def ress_parse(app_name)
+ dir_suffix = "/engage/#{app_name}"
+ class_ads = [
+ "GlueSiteUniqueID", "GlueCEInfoHostName", "GlueCEInfoJobManager",
+ "GlueCEInfoGatekeeperPort", "GlueCEInfoApplicationDir", "GlueCEInfoDataDir",
+ "GlueCEInfoTotalCPUs"
+ ]
+ ress_query(class_ads).each_line do |line|
+ line.chomp!
+#puts "ress_query: line is:"
+#puts "$"<<line<<"$"
+#puts "---"
+ set = line.split("|")
+ next if not set.size > 0
+
+ value = OpenStruct.new
+
+ value.jm = set[class_ads.index("GlueCEInfoJobManager")]
+ value.url = set[class_ads.index("GlueCEInfoHostName")]
+ value.throttle = (set[class_ads.index("GlueCEInfoTotalCPUs")].to_f - 2.0) / 100.0
+ name = set[class_ads.index("GlueSiteUniqueID")] + "__" + value.url
+ value.name = set[class_ads.index("GlueSiteUniqueID")]
+
+ value.app_dir = set[class_ads.index("GlueCEInfoApplicationDir")]
+ value.app_dir.sub!(/\/$/, "")
+ value.data_dir = set[class_ads.index("GlueCEInfoDataDir")]
+ value.data_dir.sub!(/\/$/, "")
+
+ value.app_dir += dir_suffix
+ value.data_dir += dir_suffix
+
+ # Hard-wired exceptions
+ value.app_dir = "/osg/app" if name =~ /GridUNESP_CENTRAL/
+ value.data_dir = "/osg/data" if name =~ /GridUNESP_CENTRAL/
+ value.app_dir.sub!(dir_suffix, "/engage-#{app_name}") if name =~ /BNL-ATLAS/
+ value.data_dir.sub!(dir_suffix, "/engage-#{app_name}") if name =~ /BNL-ATLAS/
+
+ yield name, value
+ end
+end
+
+if __FILE__ == $0 then
+ raise "No whitelist file" if !ARGV[0]
+
+ # Blacklist of non-working sites
+ blacklist = []
+ ARGV[1] = "scec" if !ARGV[1]
+ whitelist = IO.readlines(ARGV[0]).map { |line| line.chomp! }
+
+ # Removes duplicate site entries (i.e. multilpe GRAM endpoints)
+ sites = {}
+ ress_parse(ARGV[1]) do |name, value|
+ next if blacklist.index(name) and not blacklist.empty?
+ next if not whitelist.index(name) and not whitelist.empty?
+ sites[name] = value if sites[name] == nil
+ end
+
+ condor_out = File.open("condor_osg.xml", "w")
+ gt2_out = File.open("gt2_osg.xml", "w")
+ coaster_out = File.open("coaster_osg.xml", "w")
+
+ tc_out = File.open("tc.data", "w")
+ workflow_out = File.open("worker.swift", "w")
+ slave_out = File.open("slave.swift", "w")
+
+ condor = ERB.new(condor_sites, 0, "%<>")
+ gt2 = ERB.new(gt2_sites, 0, "%<>")
+ coaster = ERB.new(coaster_sites, 0, "%<>")
+
+ tc = ERB.new(swift_tc, 0, "%<>")
+ workflow = ERB.new(swift_workflow, 0, "%<>")
+ slave = ERB.new(slave_workflow, 0, "%<>")
+
+ condor_out.puts condor.result(binding)
+ gt2_out.puts gt2.result(binding)
+ coaster_out.puts coaster.result(binding)
+
+ tc_out.puts tc.result(binding)
+ workflow_out.puts workflow.result(binding)
+ slave_out.puts slave.result(binding)
+
+ condor_out.close
+ gt2_out.close
+ coaster_out.close
+
+ tc_out.close
+ workflow_out.close
+ slave_out.close
+end
Property changes on: trunk/bin/grid/mk_catalog.rb
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/mk_cats.rb
===================================================================
--- trunk/bin/grid/mk_cats.rb (rev 0)
+++ trunk/bin/grid/mk_cats.rb 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,149 @@
+#!/usr/bin/env ruby
+
+require 'erb'
+require 'ostruct'
+
+# ports lists for the templates
+
+#coaster_service[] = `cat service-*.sport;`
+#worker_service[] = `cat service-*.wport`;
+
+coaster_service = 12345;
+worker_service = 67890;
+
+hostname="communicado.ci.uchicago.edu";
+hostip="128.135.125.17";
+
+coaster_sites = %q[
+<config>
+<% ctr = 0
+ sites.each_key do |name|
+ jm = sites[name].jm
+ url = sites[name].url
+ app_dir = sites[name].app_dir
+ data_dir = sites[name].data_dir
+ throttle = sites[name].throttle %>
+
+ <pool handle="<%=name%>">
+ <execution provider="coaster-persistent" url="https://<%= hostname %>:<%= coaster_service + ctr %>"
+ jobmanager="local:local" />
+
+ <profile namespace="globus" key="workerManager">passive</profile>
+
+ <profile namespace="karajan" key="initialScore">200.0</profile>
+ <profile namespace="karajan" key="jobThrottle"><%=throttle%></profile>
+
+ <gridftp url="gsiftp://<%=url%>"/>
+ <workdirectory><%=data_dir%>/swift_scratch</workdirectory>
+ </pool>
+<% ctr += 1
+ end %>
+</config>
+]
+
+def OLDress_query(class_ads)
+ cmd = "condor_status -pool engage-submit.renci.org"
+ class_ads[0..-2].each do |class_ad|
+ cmd << " -format \"%s|\" #{class_ad}"
+ end
+ cmd << " -format \"%s\\n\" #{class_ads[-1]}"
+ `#{cmd}`
+end
+
+def ress_query(class_ads)
+ cmd = "./ressfields.sh"
+ class_ads[0..-1].each do |class_ad|
+ cmd << " #{class_ad}"
+ end
+ `#{cmd}`
+end
+
+def ress_parse(app_name)
+ dir_suffix = "/engage/#{app_name}"
+ class_ads = [
+ "GlueSiteUniqueID", "GlueCEInfoHostName", "GlueCEInfoJobManager",
+ "GlueCEInfoGatekeeperPort", "GlueCEInfoApplicationDir", "GlueCEInfoDataDir",
+ "GlueCEInfoTotalCPUs"
+ ]
+ ress_query(class_ads).each_line do |line|
+ line.chomp!
+puts "ress_query: line is:"
+puts "$"<<line<<"$"
+puts "---"
+ set = line.split("|")
+ next if not set.size > 0
+
+ value = OpenStruct.new
+
+ value.jm = set[class_ads.index("GlueCEInfoJobManager")]
+ value.url = set[class_ads.index("GlueCEInfoHostName")]
+ value.throttle = (set[class_ads.index("GlueCEInfoTotalCPUs")].to_f - 2.0) / 100.0
+ name = set[class_ads.index("GlueSiteUniqueID")] + "__" + value.url
+ value.name = set[class_ads.index("GlueSiteUniqueID")]
+
+ value.app_dir = set[class_ads.index("GlueCEInfoApplicationDir")]
+ value.app_dir.sub!(/\/$/, "")
+ value.data_dir = set[class_ads.index("GlueCEInfoDataDir")]
+ value.data_dir.sub!(/\/$/, "")
+
+ value.app_dir += dir_suffix
+ value.data_dir += dir_suffix
+
+ # Hard-wired exceptions
+ value.app_dir = "/osg/app" if name =~ /GridUNESP_CENTRAL/
+ value.data_dir = "/osg/data" if name =~ /GridUNESP_CENTRAL/
+ value.app_dir.sub!(dir_suffix, "/engage-#{app_name}") if name =~ /BNL-ATLAS/
+ value.data_dir.sub!(dir_suffix, "/engage-#{app_name}") if name =~ /BNL-ATLAS/
+
+ yield name, value
+ end
+end
+
+if __FILE__ == $0 then
+ raise "No whitelist file" if !ARGV[0]
+
+ # Blacklist of non-working sites
+ blacklist = []
+ ARGV[1] = "scec" if !ARGV[1]
+ whitelist = IO.readlines(ARGV[0]).map { |line| line.chomp! }
+
+ # Removes duplicate site entries (i.e. multilpe GRAM endpoints)
+ sites = {}
+ ress_parse(ARGV[1]) do |name, value|
+ next if blacklist.index(name) and not blacklist.empty?
+ next if not whitelist.index(name) and not whitelist.empty?
+ sites[name] = value if sites[name] == nil
+ end
+
+ # condor_out = File.open("condor_osg.xml", "w")
+ # gt2_out = File.open("gt2_osg.xml", "w")
+ coaster_out = File.open("coaster_osg.xml", "w")
+
+ # tc_out = File.open("tc.data", "w")
+ # workflow_out = File.open("worker.swift", "w")
+ # slave_out = File.open("slave.swift", "w")
+
+ # condor = ERB.new(condor_sites, 0, "%<>")
+ # gt2 = ERB.new(gt2_sites, 0, "%<>")
+ coaster = ERB.new(coaster_sites, 0, "%<>")
+
+ # tc = ERB.new(swift_tc, 0, "%<>")
+ # workflow = ERB.new(swift_workflow, 0, "%<>")
+ # slave = ERB.new(slave_workflow, 0, "%<>")
+
+ # condor_out.puts condor.result(binding)
+ # gt2_out.puts gt2.result(binding)
+ coaster_out.puts coaster.result(binding)
+
+ # tc_out.puts tc.result(binding)
+ # workflow_out.puts workflow.result(binding)
+ # slave_out.puts slave.result(binding)
+
+ # condor_out.close
+ # gt2_out.close
+ coaster_out.close
+
+ # tc_out.close
+ # workflow_out.close
+ # slave_out.close
+end
Property changes on: trunk/bin/grid/mk_cats.rb
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/mk_osg_sitetest.rb
===================================================================
--- trunk/bin/grid/mk_osg_sitetest.rb (rev 0)
+++ trunk/bin/grid/mk_osg_sitetest.rb 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,116 @@
+#!/usr/bin/env ruby
+
+# File: mk_test.rb
+# Date: 2010-10-06
+# Author: Allan Espinosa
+# Email: aespinosa at cs.uchicago.edu
+# Description: A Swift workflow generator to test OSG sites through the Engage
+# VO. Generates the accompanying tc.data and sites.xml as well.
+# Run with "swift -sites.file sites.xml -tc.file tc.data
+# test_osg.swift"
+
+require 'erb'
+require 'ostruct'
+require 'ress'
+
+swift_tc = %q[
+localhost echo /bin/echo INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="00:05:00"
+<% ctr = 0
+ sites.each_key do |name| %>
+<% jm = sites[name].jm
+ url = sites[name].url
+ app_dir = sites[name].app_dir
+ data_dir = sites[name].data_dir
+ throttle = sites[name].throttle %>
+<%=name%> cat<%=ctr%> /bin/cat INSTALLED INTEL32::LINUX GLOBUS::maxwalltime="00:01:30"
+<% ctr += 1
+ end %>
+]
+
+swift_workflow = %q[
+type file;
+
+app (file t) echo(string i) {
+ echo i stdout=@filename(t);
+}
+
+<% ctr = 0
+ sites.each_key do |name| %>
+app (file t) cat<%= ctr %>(file input ) {
+ cat<%= ctr %> @filename(input) stdout=@filename(t);
+}
+<% ctr += 1
+ end %>
+
+<% ctr = 0
+ sites.each_key do |name| %>
+file input<%= ctr %><"cat<%= ctr %>.in">;
+input<%= ctr %> = echo("<%= name %>");
+file out<%= ctr %><"cat<%= ctr %>.out">;
+out<%= ctr %> = cat<%= ctr %>(input<%= ctr %>);
+<% ctr += 1
+ end %>
+
+]
+
+condor_sites = %q[
+<config>
+ <pool handle="localhost">
+ <filesystem provider="local" />
+ <execution provider="local" />
+ <workdirectory >/var/tmp</workdirectory>
+ <profile namespace="karajan" key="jobThrottle">0</profile>
+ </pool>
+<% sites.each_key do |name| %>
+<% jm = sites[name].jm
+ url = sites[name].url
+ app_dir = sites[name].app_dir
+ data_dir = sites[name].data_dir
+ throttle = sites[name].throttle %>
+
+ <pool handle="<%=name%>">
+ <execution provider="condor" url="none"/>
+
+ <profile namespace="globus" key="jobType">grid</profile>
+ <profile namespace="globus" key="gridResource">gt2 <%=url%>/jobmanager-<%=jm%></profile>
+
+ <profile namespace="karajan" key="initialScore">20.0</profile>
+ <profile namespace="karajan" key="jobThrottle"><%=throttle%></profile>
+
+ <gridftp url="gsiftp://<%=url%>"/>
+ <workdirectory><%=data_dir%>/swift_scratch</workdirectory>
+ </pool>
+<% end %>
+</config>
+]
+
+# Redlist of non-working sites
+redlist = [ ]
+
+puts("mk_test starting")
+
+# Removes duplicate site entries (i.e. multilpe GRAM endpoints)
+sites = {}
+ress_parse do |name, value|
+ next if redlist.index(name)
+ sites[name] = value if sites[name] == nil
+print("site: ")
+puts(name)
+#puts(name,value)
+end
+
+condor_out = File.open("sites.xml", "w")
+tc_out = File.open("tc.data", "w")
+swift_out = File.open("test_osg.swift", "w")
+
+condor = ERB.new(condor_sites, 0, "%<>")
+tc = ERB.new(swift_tc, 3, "%<>")
+swift = ERB.new(swift_workflow, 0, "%<>")
+
+condor_out.puts condor.result(binding)
+tc_out.puts tc.result(binding)
+swift_out.puts swift.result(binding)
+
+condor_out.close
+tc_out.close
+swift_out.close
Property changes on: trunk/bin/grid/mk_osg_sitetest.rb
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/osgcat
===================================================================
--- trunk/bin/grid/osgcat (rev 0)
+++ trunk/bin/grid/osgcat 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,170 @@
+#!/usr/bin/perl
+
+use strict;
+
+use Pod::Usage;
+use Getopt::Long;
+use File::Temp qw/ tempfile tempdir mktemp /;
+
+my $opt_help = 0;
+my $opt_vo = 'engage';
+my $opt_engage_verified = 0;
+my $opt_gt4 = 0;
+my $opt_condorg = 0;
+my $opt_out = '&STDOUT';
+
+Getopt::Long::Configure('bundling');
+GetOptions(
+ "help" => \$opt_help,
+ "vo=s" => \$opt_vo,
+ "engage-verified" => \$opt_engage_verified,
+ "gt4" => \$opt_gt4,
+ "condor-g" => \$opt_condorg,
+ "out=s" => \$opt_out,
+) or pod2usage(1);
+
+if ($opt_help) {
+ pod2usage(1);
+}
+
+if ($opt_engage_verified && $opt_vo ne "engage") {
+ die("You can not specify a vo when using --engage-verified\n");
+}
+
+# make sure condor_status is in the path
+my $out = `which condor_status 2>/dev/null`;
+if ($out eq "") {
+ die("This tool depends on condor_status.\n" .
+ "Please make sure condor_status in your path.\n");
+}
+
+my %ads;
+my %tmp;
+my $cmd = "condor_status -any -long -constraint" .
+ " 'StringlistIMember(\"VO:$opt_vo\";GlueCEAccessControlBaseRule)'" .
+ " -pool osg-ress-1.fnal.gov";
+# if we want the engage verified sites, ignore opt_vo and query against
+# engage central collector
+if ($opt_engage_verified) {
+ $cmd = "condor_status -any -long -constraint" .
+ " 'SiteVerified==TRUE'" .
+ " -pool engage-central.renci.org"
+}
+open(STATUS, "$cmd|");
+while(<STATUS>) {
+ chomp;
+ if ($_ eq "") {
+ if ($tmp{'GlueSiteName'} ne "") {
+ my %copy = %tmp;
+ $ads{$tmp{'GlueSiteName'} . "_" . $tmp{'GlueClusterUniqueID'}} = \%copy;
+ undef %tmp;
+ }
+ }
+ else {
+ my ($key, $value) = split(/ = /, $_, 2);
+ $value =~ s/^"|"$//g; # remove quotes from Condor strings
+ $tmp{$key} = $value;
+ }
+}
+close(STATUS);
+
+# lowercase vo
+my $lc_vo = lc($opt_vo);
+
+open(FH, ">$opt_out") or die("Unable to open $opt_out");
+print FH "<config>\n";
+foreach my $siteid (sort keys %ads) {
+ my $contact = $ads{$siteid}->{'GlueCEInfoContactString'};
+ my $host = $contact;
+ $host =~ s/[:\/].*//;
+ my $jm = $contact;
+ $jm =~ s/.*jobmanager-//;
+ if ($jm eq "pbs") {
+ $jm = "PBS";
+ }
+ elsif ($jm eq "lsf") {
+ $jm = "LSF";
+ }
+ elsif ($jm eq "sge") {
+ $jm = "SGE";
+ }
+ elsif ($jm eq "condor") {
+ $jm = "Condor";
+ }
+ my $workdir = $ads{$siteid}->{'GlueCEInfoDataDir'};
+ print FH "\n";
+ print FH " <!-- $siteid -->\n";
+ print FH " <pool handle=\"$siteid\" >\n";
+ print FH " <gridftp url=\"gsiftp://$host/\" />\n";
+ if ($opt_condorg) {
+ print FH " <execution provider=\"condor\" />\n";
+ print FH " <profile namespace=\"globus\" key=\"jobType\">grid</profile>\n";
+ if($opt_gt4) {
+ die("swift-osg-ress-site-catalog cannot generate Condor-G + GRAM4 sites files");
+ }
+ print FH " <profile namespace=\"globus\" key=\"gridResource\">gt2 $contact</profile>\n";
+ }
+ elsif ($opt_gt4) {
+ print FH " <execution provider=\"gt4\" jobmanager=\"$jm\" url=\"$host:9443\" />\n";
+ }
+ else {
+ print FH " <jobmanager universe=\"vanilla\" url=\"$contact\" major=\"2\" />\n";
+ }
+ print FH " <workdirectory >$workdir/$lc_vo/tmp/$host</workdirectory>\n";
+ print FH " </pool>\n";
+}
+print FH "\n</config>\n";
+close(FH);
+
+exit(0);
+
+__END__
+
+=head1 NAME
+
+swift-osg-ress-site-catalog - converts ReSS data to Swift site catalog
+
+=head1 SYNOPSIS
+
+swift-osg-ress-site-catalog [options]
+
+=head1 OPTIONS
+
+=over 8
+
+=item B<--help>
+
+Show this help message
+
+=item B<--vo=[name]>
+
+Set what VO to query ReSS for
+
+=item B<--engage-verified>
+
+Only retrieve sites verified by the Engagement VO site verification tests
+This can not be used together with --vo, as the query will only work for
+sites advertising support for the Engagement VO.
+
+This option means information will be retrieved from the Engagement collector
+instead of the top-level ReSS collector.
+
+=item B<--out=[filename]>
+
+Write to [filename] instead of stdout
+
+=item B<--condor-g>
+
+Generates sites files which will submit jobs using a local Condor-G
+installation rather than through direct GRAM2 submission.
+
+=back
+
+=head1 DESCRIPTION
+
+B<swift-osg-ress-site-catalog> converts ReSS data to Swift site catalog
+
+=cut
+
+
+
Property changes on: trunk/bin/grid/osgcat
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/ress.rb
===================================================================
--- trunk/bin/grid/ress.rb (rev 0)
+++ trunk/bin/grid/ress.rb 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,55 @@
+require 'ostruct'
+
+def ress_query(class_ads)
+ cmd = "condor_status -pool engage-submit.renci.org"
+ class_ads[0..-2].each do |class_ad|
+ cmd << " -format \"%s|\" #{class_ad}"
+ end
+ cmd << " -format \"%s\\n\" #{class_ads[-1]}"
+ `#{cmd}`
+end
+
+def ress_parse
+ dir_suffix = "/engage/swift"
+ class_ads = [
+ "GlueSiteUniqueID", "GlueCEInfoHostName", "GlueCEInfoJobManager",
+ "GlueCEInfoGatekeeperPort", "GlueCEInfoApplicationDir", "GlueCEInfoDataDir",
+ "GlueCEInfoTotalCPUs"
+ ]
+ ress_query(class_ads).each_line do |line|
+ line.chomp!
+ set = line.split("|")
+ next if not set.size > 0
+
+ value = OpenStruct.new
+
+ value.jm = set[class_ads.index("GlueCEInfoJobManager")]
+ value.url = set[class_ads.index("GlueCEInfoHostName")]
+ value.throttle = (set[class_ads.index("GlueCEInfoTotalCPUs")].to_f - 2.0) / 100.0
+ name = set[class_ads.index("GlueSiteUniqueID")] + "__" + value.url
+ value.name = set[class_ads.index("GlueSiteUniqueID")]
+
+ value.app_dir = set[class_ads.index("GlueCEInfoApplicationDir")]
+ value.app_dir.sub!(/\/$/, "")
+ value.data_dir = set[class_ads.index("GlueCEInfoDataDir")]
+ value.data_dir.sub!(/\/$/, "")
+
+ value.app_dir = "/osg/app" if name =~ /GridUNESP_CENTRAL/
+ value.data_dir = "/osg/data" if name =~ /GridUNESP_CENTRAL/
+
+ if name =~ /BNL-ATLAS/
+ value.app_dir += "/engage-scec"
+ value.data_dir += "/engage-scec"
+ #elsif name == "LIGO_UWM_NEMO" or name == "SMU_PHY" or name == "UFlorida-HPC" or name == "RENCI-Engagement" or name == "RENCI-Blueridge"
+ #value.app_dir += "/osg/scec"
+ #value.data_dir += "/osg/scec"
+ else
+ value.app_dir += dir_suffix
+ value.data_dir += dir_suffix
+ end
+
+ yield name, value
+ end
+end
+
+
Property changes on: trunk/bin/grid/ress.rb
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/ressfields
===================================================================
--- trunk/bin/grid/ressfields (rev 0)
+++ trunk/bin/grid/ressfields 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,8 @@
+#! /bin/sh
+for f in $*; do
+ flist=$flist" -format %s| "$f
+done
+
+echo flist: $flist >>ressfields.log
+
+condor_status -pool engage-submit.renci.org $flist -format "\\n" ""
\ No newline at end of file
Property changes on: trunk/bin/grid/ressfields
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/run_workers
===================================================================
--- trunk/bin/grid/run_workers (rev 0)
+++ trunk/bin/grid/run_workers 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,16 @@
+#! /bin/bash
+
+bin=$(cd $(dirname $0); pwd)
+
+work=$(mktemp -d run_workers.run.XXX)
+if [ $? != 0 ]; then
+ echo $0: failed to create work directory
+ exit 1
+fi
+echo $0: working directory is $work
+cd $work
+
+ruby -I $bin $bin/nqueued.rb ../greensites mwildeT1
+
+# To Do:
+# manage a total running worker pool based on demand from swift + slack
Property changes on: trunk/bin/grid/run_workers
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/sites
===================================================================
--- trunk/bin/grid/sites (rev 0)
+++ trunk/bin/grid/sites 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,8 @@
+
+swift-osg-ress-site-catalog --vo=engage --condor-g | # >osg.xml
+
+for s in $(grep gridRes | sed -e 's/^.* //' -e 's/<.*//' ); do
+ ( sname=$(echo $s | sed -e 's,/.*,,')
+ echo site: $sname
+ )
+done
Property changes on: trunk/bin/grid/sites
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/start-ranger-service
===================================================================
--- trunk/bin/grid/start-ranger-service (rev 0)
+++ trunk/bin/grid/start-ranger-service 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,70 @@
+#! /bin/bash
+
+# FIXME: make these commandline keyword arguments, eg --nodes=
+
+NODES=${1:-1}
+WALLTIME=${2:-00:10:00}
+PROJECT=${3:-TG-DBS080004N}
+QUEUE=${4:-development}
+REMOTE_USER=${5:-tg455797}
+
+STARTSERVICE=true
+HOST=tg-login.ranger.tacc.teragrid.org
+BIN=$(cd $(dirname $0); pwd)
+
+echo NODES=$NODES WALLTIME=$WALLTIME PROJECT=$PROJECT REMOTE_USER=$REMOTE_USER
+LOGLEVEL=INFO # INFO, DEBUG, TRACE for increasing detail
+
+CORESPERNODE=16
+
+THROTTLE=$(echo "scale=2; ($NODES*$CORESPERNODE)/100 -.01"|bc)
+
+echo THROTTLE=$THROTTLE
+
+# This lets user run this script to add another job full of workers to an existing coaster service
+# Must be started in the same directory where start-swift-service created the service.wports file.
+
+if [ $STARTSERVICE = true ]; then
+ start-swift-service 1 &
+ sleep 5
+ SPORT=$(cat service.sports)
+ cat >sites.pecos.xml <<EOF
+
+ <config>
+ <pool handle="localhost">
+ <execution provider="coaster-persistent" url="http://localhost:$SPORT" jobmanager="local:local"/>
+ <profile namespace="globus" key="workerManager">passive</profile>
+ <profile namespace="globus" key="jobsPerNode">$CORESPERNODE</profile>
+ <profile key="jobThrottle" namespace="karajan">$THROTTLE</profile>
+ <profile namespace="karajan" key="initialScore">10000</profile>
+ <!-- <filesystem provider="local" url="none" /> -->
+ <profile namespace="swift" key="stagingMethod">proxy</profile>
+ <workdirectory>/tmp/wilde</workdirectory>
+ </pool>
+ </config>
+EOF
+fi
+
+WPORT=$(cat service.wports)
+SERVICE_URL=http://$(hostname -f):$WPORT
+echo swift service started - SPORT=$(cat service.sports) WPORT=$WPORT SERVICE_URL=$SERVICE_URL
+
+# FIXME: scp the right worker.pl, worker.sh and .sub files to the dest system (Ranger)
+
+rdir=swift_gridtools
+ssh $REMOTE_USER@$HOST mkdir -p $rdir
+
+if [ $? != 0 ]; then
+ echo $0: unable to create remote directory $rdir
+ exit 1
+fi
+
+echo Created remote dir
+
+scp $BIN/{worker.pl,workers.ranger.sh,workers.ranger.sub} $REMOTE_USER@$HOST:$rdir
+
+echo Copied grid tools to remote dir
+
+ssh $REMOTE_USER@$HOST qsub -A $PROJECT -N runworkers -pe 16way $(($NODES * 16)) -l h_rt=$WALLTIME -q $QUEUE -v SERVICE_URL=$SERVICE_URL,WORKER_LOGLEVEL=$LOGLEVEL $rdir/workers.ranger.sub
+
+echo Submitted remote worker launching script
Property changes on: trunk/bin/grid/start-ranger-service
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/start-ranger-service~
===================================================================
--- trunk/bin/grid/start-ranger-service~ (rev 0)
+++ trunk/bin/grid/start-ranger-service~ 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,72 @@
+#! /bin/bash
+
+# FIXME: make these commandline keyword arguments, eg --nodes=
+
+NODES=${1:-1}
+WALLTIME=${2:-00:10:00}
+PROJECT=${3:-TG-DBS080004N}
+QUEUE=${4:-development}
+REMOTE_USER=${5:-tg455797}
+
+STARTSERVICE=true
+HOST=tg-login.ranger.tacc.teragrid.org
+BIN=$(cd $(dirname $0); pwd)
+
+echo NODES=$NODES WALLTIME=$WALLTIME PROJECT=$PROJECT REMOTE_USER=$REMOTE_USER
+LOGLEVEL=INFO # INFO, DEBUG, TRACE for increasing detail
+
+CORESPERNODE=16
+
+THROTTLE=$(echo "scale=2; ($NODES*$CORESPERNODE)/100 -.01"|bc)
+
+echo THROTTLE=$THROTTLE
+
+exit
+
+# This lets user run this script to add another job full of workers to an existing coaster service
+# Must be started in the same directory where start-swift-service created the service.wports file.
+
+if [ $STARTSERVICE = true ]; then
+ start-swift-service 1 &
+ sleep 5
+ SPORT=$(cat service.sports)
+ cat >sites.pecos.xml <<EOF
+
+ <config>
+ <pool handle="localhost">
+ <execution provider="coaster-persistent" url="http://localhost:$SPORT" jobmanager="local:local"/>
+ <profile namespace="globus" key="workerManager">passive</profile>
+ <profile namespace="globus" key="jobsPerNode">$CORESPERNODE</profile>
+ <profile key="jobThrottle" namespace="karajan">$THROTTLE</profile>
+ <profile namespace="karajan" key="initialScore">10000</profile>
+ <!-- <filesystem provider="local" url="none" /> -->
+ <profile namespace="swift" key="stagingMethod">proxy</profile>
+ <workdirectory>/tmp/wilde</workdirectory>
+ </pool>
+ </config>
+EOF
+fi
+
+WPORT=$(cat service.wports)
+SERVICE_URL=http://$(hostname -f):$WPORT
+echo swift service started - SPORT=$(cat service.sports) WPORT=$WPORT SERVICE_URL=$SERVICE_URL
+
+# FIXME: scp the right worker.pl, worker.sh and .sub files to the dest system (Ranger)
+
+rdir=swift_gridtools
+ssh $REMOTE_USER@$HOST mkdir -p $rdir
+
+if [ $? != 0 ]; then
+ echo $0: unable to create remote directory $rdir
+ exit 1
+fi
+
+echo Created remote dir
+
+scp $BIN/{worker.pl,workers.ranger.sh,workers.ranger.sub} $REMOTE_USER@$HOST:$rdir
+
+echo Copied grid tools to remote dir
+
+ssh $REMOTE_USER@$HOST qsub -A $PROJECT -N runworkers -pe 16way $(($NODES * 16)) -l h_rt=$WALLTIME -q $QUEUE -v SERVICE_URL=$SERVICE_URL,WORKER_LOGLEVEL=$LOGLEVEL $rdir/workers.ranger.sub
+
+echo Submitted remote worker launching script
Property changes on: trunk/bin/grid/start-ranger-service~
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/start-swift-service
===================================================================
--- trunk/bin/grid/start-swift-service (rev 0)
+++ trunk/bin/grid/start-swift-service 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+NSERVICES=$1
+SERVICE=coaster-service # found via PATH
+
+ontrap() # FIXME: Not needed?
+{
+ echo '====>' in ontrap
+ trap - 1 2 3 15
+ echo start_service: trapping exit or signal
+ kill $(cat service-*.pid)
+}
+
+# trap ontrap 1 2 3 15 # FIXME: Not needed?
+
+rm -f service.sports service.wports
+for i in `seq -w 0 $((NSERVICES - 1))`; do
+ rm -f service-$i.{sport,wport,pid,log}
+ $SERVICE -nosec -passive -portfile service-$i.sport -localportfile service-$i.wport &> service-$i.log &
+ echo $! >service-$i.pid
+ sleep 3
+ if [ -s service-$i.sport ]; then
+ echo $(cat service-$i.sport) >> service.sports
+ else
+ echo service-$i.sport does not exist or is empty. exiting.
+ exit 1
+ fi
+ if [ -s service-$i.wport ]; then
+ echo $(cat service-$i.wport) >> service.wports
+ else
+ echo service-$i.wport does not exist or is empty. exiting.
+ exit 1
+ fi
+done
+
+wait
Property changes on: trunk/bin/grid/start-swift-service
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/swift-workers
===================================================================
--- trunk/bin/grid/swift-workers (rev 0)
+++ trunk/bin/grid/swift-workers 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,222 @@
+#!/usr/bin/env ruby
+
+require 'mk_catalog'
+require 'etc'
+
+class Site
+ attr_accessor :grid_resource, :data_dir, :app_dir, :name, :port
+ attr_reader :submit_file
+
+# executable = <%= @app_dir %>/worker.pl # FIXME (below)
+
+# transfer_executable = True
+
+# executable = /home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/worker.pl
+# arguments = http://128.135.125.17:<%= port %> <%= name %> /tmp 14400
+
+ def gen_submit(count = 1)
+ job = %q[
+ universe = grid
+ stream_output = False
+ stream_error = False
+ transfer_executable = false
+ periodic_remove = JobStatus == 5
+ notification = Never
+
+ globus_rsl = (maxwalltime=240)
+ grid_resource = <%= @grid_resource %>
+ executable = /bin/sleep
+ arguments = 300
+ log = condor.log
+
+ <% count.times { %>queue
+ <% } %>
+ ]
+
+ ERB.new(job.gsub(/^\s+/, ""), 0, "%<>", "@submit_file").result(binding)
+ end
+
+ def submit_job(count)
+# puts "Submitting #{@name} #{count} jobs"
+ output = ""
+#return output
+ submitfile = gen_submit(count)
+ IO.popen("condor_submit", "w+") do |submit|
+ submit.puts submitfile
+ submit.close_write
+ output = submit.read
+ end
+ output
+ end
+
+ def queued
+ jobs = `condor_q #{$username} -const 'GridResource == \"#{@grid_resource}\" && JobStatus == 1' -format \"%s \" GlobalJobId`
+ jobs.split(" ").size
+ end
+
+ def running
+ jobs = `condor_q #{$username} -const 'GridResource == \"#{@grid_resource}\" && JobStatus == 2' -format \"%s \" GlobalJobId`
+ jobs.split(" ").size
+ end
+
+end
+
+=begin
+# For reference:
+JobStatus in job ClassAds
+
+0 Unexpanded U
+1 Idle I
+2 Running R
+3 Removed X
+4 Completed C
+5 Held H
+6 Submission_err E
+=end
+
+if __FILE__ == $0
+ raise "No whitelist file" if !ARGV[0]
+
+ start_port = 61100 # FIXME
+ ctr = 0
+ threads = []
+ ARGV[1] = "scec" if !ARGV[1]
+ whitelist = IO.readlines(ARGV[0]).map { |line| line.chomp! }
+ $username = Etc.getlogin
+
+ puts "Username = #{$username}"
+
+ minSiteJobs = 2
+ paddedDemand = 0
+ swiftDemand = 0
+ totalCores = 0
+ totalRunning = 0
+
+ ress_parse(ARGV[1]) do |name, value|
+ next if not whitelist.index(name) and not whitelist.empty?
+ totalCores += (value.throttle * 100 + 2).to_i
+ end
+ puts "totalCores for green sites = #{totalCores}"
+
+ demandThread = Thread.new("monitor-demand") do |t|
+ puts "starting demand thread"
+ while true do
+ puts "in demand thread"
+ # swiftDemand = IO.read("swiftDemand") # Replace this with sensor of Swift demand
+ swiftDemand = 15
+ paddedDemand = (swiftDemand * 1.2).to_i
+ totalRunning = `condor_q #{$username} -const 'JobStatus == 2' -format \"%s \" GlobalJobId`.split(" ").size
+ puts "*** demandThread: swiftDemand=#{swiftDemand} paddedDemand=#{paddedDemand} totalRunning=#{totalRunning}"
+ sleep 60
+ end
+ end
+
+ ress_parse(ARGV[1]) do |name, value|
+ next if not whitelist.index(name) and not whitelist.empty?
+ site = Site.new
+ site.name = name
+ site.grid_resource = "gt2 #{value.url}/jobmanager-#{value.jm}"
+ site.app_dir = value.app_dir
+ site.data_dir = value.data_dir
+ site.port = start_port + ctr
+
+ # local per-site attributes:
+
+ cores = (value.throttle * 100 + 2).to_i
+ siteFraction = cores.to_f / totalCores.to_f
+ siteTargetRunning = [ (swiftDemand.to_f * siteFraction), minSiteJobs ].max
+ siteTargetQueued = [ (swiftDemand.to_f * siteFraction), minSiteJobs ].max
+
+ printf "site: %5d cores %2d%% %s\n", cores, siteFraction * 100, name
+ targetQueued = 3
+
+ site.gen_submit
+
+ threads << Thread.new(name) do |job|
+ trip=0
+ while true do
+ if ( (swiftDemand) > totalRunning ) then
+ # demands > running: enforce N-queued algorithm
+ queued = site.queued
+ running = site.running
+ printf "trip %d site %s running %d queued %d\n", trip, name,running,queued
+ if (running+queued) == 0 then
+ newJobs = [ (paddedDemand * siteFraction).to_i, minSiteJobs ].max
+ printf "trip %d site %s empty - submitting %d (%d%% of demand %d)\n",
+ trip, name, newJobs, siteFraction * 100, paddedDemand
+ site.submit_job(newJobs)
+ elsif queued == 0 then
+ toRun = [ running * 1.2, [(paddedDemand * siteFraction).to_i, minSiteJobs ].max ].max
+ printf "trip %d site %s queued %d target %d has drained queue - submitting %d\n",
+ trip, name, queued, targetQueued, toRun
+ site.submit_job(toRun)
+ elsif queued < targetQueued
+ printf "trip %d site %s queued %d below target %d - submitting %d\n",
+ trip, name, queued, targetQueued, targetQueued-queued
+ site.submit_job(targetQueued - queued)
+ end
+ trip += 1
+ sleep 60
+ # puts "#{name}: #{total}"
+ end
+ end
+ end
+
+ ctr += 1
+ end
+end
+threads.each { |job| job.join }
+puts "All threads completed."
+
+# TODO:
+#
+# tag jobs for each run uniquely, and track them as a unique factory instance
+#
+
+=begin
+
+"Keep N Queued" Algorithm
+
+Goal:
+- monitor a running swift script to track its changing demand for cores
+- increase the # of running workers to meet the demand
+- let workers that are idle time out when supply is greater than demand
+
+Initially:
+- set a constant demand
+- determine #cores at each site
+
+initialPressure = 1.2 # increase demand
+initialDemand = 50 # initial demand prior to first poll of Swift, to prime the worker pool ahead of Swift demand
+
+- set a constant target queued for each site based on ncores
+- set a target #running
+
+THREAD 0:
+ demand = initialDemand
+ for each site
+ site.need = (site.cores/totalcores) * demand
+ sleep delay
+
+
+ while swiftScriptIsRunning
+ get demand
+ get #running
+
+
+
+THREAD for each site
+ while swiftScriptIsRunning
+ get site.running
+ get set.queued
+ need = demand - running
+ if need > 0
+ if running+queued = 0
+
+keep queued on each site:
+ max( expectation, 50% of observation )
+
+ toalc=1000
+ sitec = 200 20% d=100 ex=20 q=20
+ r=50 q=25
+=end
\ No newline at end of file
Property changes on: trunk/bin/grid/swift-workers
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/worker.sh
===================================================================
--- trunk/bin/grid/worker.sh (rev 0)
+++ trunk/bin/grid/worker.sh 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,14 @@
+#! /bin/bash
+
+SERVICEURL=$1
+NWORKERS=$2
+LOGLEVEL=$3
+
+# WORKER_LOGGING_LEVEL=$LOGLEVEL ./worker.pl http://128.135.125.17:$PORT swork01 ./workerlogs
+
+LOGDIR=/tmp/$USER/workerlogs
+
+mkdir -p $LOGDIR
+for worker in $(seq -w 0 $(($NWORKERS-1))); do
+ echo WORKER_LOGGING_LEVEL=DEBUG $HOME/worker.pl $SERVICEURL swork${worker} $LOGDIR # >& /dev/null &
+done
Property changes on: trunk/bin/grid/worker.sh
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/workers.pbs.sub
===================================================================
--- trunk/bin/grid/workers.pbs.sub (rev 0)
+++ trunk/bin/grid/workers.pbs.sub 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,36 @@
+#!/bin/bash
+#$ -N SGEtest
+#$ -pe 16way 256
+#$ -l h_rt=00:05:00
+#$ -o $HOME/swift/lab/t.stdout
+#$ -e $HOME/swift/lab/t.stderr
+#$ -v WORKER_LOGGING_LEVEL=NONE
+#$ -q development
+#$ -A TG-DBS080004N
+#$ -V
+#$ -S /bin/bash
+
+echo PE_HOSTFILE:
+echo
+cat $PE_HOSTFILE
+echo
+
+#cd / && NODES=`cat $PE_HOSTFILE | awk '{ for(i=0;i<$2;i++){print $1} }'`
+cd / && NODES=`cat $PE_HOSTFILE | awk '{print $1}'`
+ECF=/home/mwilde/.globus/scripts/t.exitcode
+INDEX=0
+for NODE in $NODES; do
+echo launch on node $NODE
+# echo "N" >$ECF.$INDEX
+# ssh $NODE /bin/bash -c \" "sleep 300; echo \\\$? > $ECF.$INDEX " \" &
+# qrsh -nostdin -l hostname=$NODE hostname -f 2>&1 &
+ ssh $NODE hostname -f &
+ rc=$?
+ if [ $rc != 0 ]; then
+ echo ssh failed for $NODE
+ fi
+ # sleep .33
+ INDEX=$((INDEX + 1))
+done
+
+wait
Added: trunk/bin/grid/workers.ranger.sh
===================================================================
--- trunk/bin/grid/workers.ranger.sh (rev 0)
+++ trunk/bin/grid/workers.ranger.sh 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,18 @@
+#! /bin/bash
+
+SERVICEURL=$1
+NWORKERS=$2
+LOGLEVEL=$3
+
+# WORKER_LOGGING_LEVEL=$LOGLEVEL ./worker.pl http://128.135.125.17:$PORT swork01 ./workerlogs
+
+LOGDIR=/tmp/$USER/workerlogs
+mkdir -p $LOGDIR
+cd $LOGDIR
+
+for worker in $(seq -w 0 $(($NWORKERS-1))); do
+ WORKER_LOGGING_LEVEL=$LOGLEVEL $HOME/swift_gridtools/worker.pl $SERVICEURL swork${worker} $LOGDIR >& /dev/null &
+done
+wait
+ls -lt $LOGDIR/
+tail $LOGDIR/*
Property changes on: trunk/bin/grid/workers.ranger.sh
___________________________________________________________________
Added: svn:executable
+ *
Added: trunk/bin/grid/workers.ranger.sub
===================================================================
--- trunk/bin/grid/workers.ranger.sub (rev 0)
+++ trunk/bin/grid/workers.ranger.sub 2011-07-06 22:41:37 UTC (rev 4785)
@@ -0,0 +1,38 @@
+#!/bin/bash
+
+#$ -N runworkers
+#$ -pe 16way 16
+#$ -l h_rt=00:10:00
+#$ -o $HOME/
+#$ -e $HOME/
+#$ -q development
+#$ -V
+#$ -S /bin/bash
+
+# Must provide on commandline:
+
+#-- #$ -A TG-DBS080004N
+
+#export SERVICE_URL=http://missingServiceURL
+#export WORKER_LOGLEVEL=TRACE
+
+echo PE_HOSTFILE:
+echo
+cat $PE_HOSTFILE
+echo
+
+rdir=$HOME/swift_gridtools
+NODES=`cat $PE_HOSTFILE | awk '{print $1}'`
+
+INDEX=0
+for NODE in $NODES; do
+ ssh $NODE /bin/bash -c \" $rdir/workers.ranger.sh $SERVICE_URL 16 $WORKER_LOGLEVEL \" &
+ INDEX=$((INDEX + 1))
+done
+wait
+
+
+
+
+
+
More information about the Swift-commit
mailing list