[Swift-devel] Shutting down BG/P jobs after swift script completes
Michael Wilde
wilde at mcs.anl.gov
Tue Nov 17 23:08:00 CST 2009
Mihael, it seems like jobs linger, and new jobs start, after a swift
script completes (on surveyor with coasters)
Info below.
- Mike
I saw this in qstat:
sur$ qstat
JobID User WallTime Nodes State Location
==================================================
137824 wilde 00:29:00 64 queued None
sur$ qstat
JobID User WallTime Nodes State Location
============================================================
137824 wilde 00:29:00 64 running ANL-R00-M0-N12-64
After the script completed, I saw this:
sur$ qstat
JobID User WallTime Nodes State Location
=============================================================
137824 wilde 00:29:00 64 running ANL-R00-M0-N12-64
137825 wilde 00:10:00 1 starting ANL-R00-M0-N14-64
sur$
sur$ qstat
JobID User WallTime Nodes State Location
============================================================
137824 wilde 00:29:00 64 running ANL-R00-M0-N12-64
137825 wilde 00:10:00 1 running ANL-R00-M0-N14-64
---
for this script activity:
sur$ run.itfixex1.sh
Running from host with compute-node reachable address of 172.17.3.16
Running in /home/wilde/protests/run.itfix.49
protlib2 home is /home/wilde/protlib2
Swift svn swift-r3190 cog-r2605
RunID: 20091117-2257-3hsazpy8
Progress:
Progress: Checking status:1
Progress: Submitting:3 Submitted:1 Finished successfully:1
Progress: Submitted:4 Finished successfully:1
Progress: Submitted:4 Finished successfully:1
Progress: Submitted:4 Finished successfully:1
Progress: Submitted:4 Finished successfully:1
Progress: Submitted:3 Active:1 Finished successfully:1
Progress: Active:4 Finished successfully:1
Progress: Active:4 Finished successfully:1
Progress: Active:3 Checking status:1 Finished successfully:1
Progress: Checking status:1 Finished successfully:5
Progress: Active:4 Finished successfully:6
Progress: Active:3 Checking status:1 Finished successfully:6
Progress: Submitting:1 Finished successfully:10
Progress: Active:1 Finished successfully:10
Progress: Checking status:1 Finished successfully:10
Final status: Finished successfully:11
Cleaning up...
Shutting down service at https://172.17.3.16:50002
Got channel MetaChannel: 177867418 -> null
+ Done
sur$
---
With these settings:
cat >tc <<EOF # ensure that whitespace here is TABS!!!
null PSim $p2home/bin/PSim.sh null null null
surveyor PSim $p2home/bin/PSim.sh null null null
localhost ItFixInit $p2home/bin/ItFixInit.sh null null null
localhost RevisePData $p2home/bin/RevisePData.sh null null null
EOF
cat >sites.xml <<EOF
<config>
<pool handle="localhost">
<filesystem provider="local"/>
<execution provider="local"/>
<workdirectory>$rundir</workdirectory>
<profile namespace="karajan" key="jobThrottle">0.01</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
</pool>
<pool handle="surveyor">
<filesystem provider="local"/>
<execution provider="coaster" jobmanager="local:cobalt"/>
<profile namespace="globus" key="slots">1</profile>
<profile namespace="globus" key="nodeGranularity">64</profile>
<profile namespace="globus" key="workersPerNode">4</profile>
<profile namespace="globus" key="maxNodes">64</profile>
<profile namespace="globus" key="project">JGI-Pilot</profile>
<profile namespace="globus" key="kernelprofile">zeptoos</profile>
<profile namespace="globus" key="maxtime">1200</profile>
<profile namespace="globus" key="alcfbgpnat">true</profile>
<profile namespace="karajan" key="jobThrottle">2.55</profile>
<profile namespace="karajan" key="initialScore">100000</profile>
<workdirectory >$rundir</workdirectory>
</pool>
</config>
EOF
# Put this back in for performance
# <scratch>/scratch</scratch>
# Copy in swift script and mappers
cp
$p2home/swift/{psim.itfixex1.swift,swift.properties,Protein.map,ItFixProtein.map,ItFixProtSim.map,plist2}
.
swiftdir=$(dirname $(dirname $(which swift)))
cp $swiftdir/etc/swift.properties .
cat >>$HOME/.swift/swift.properties <<EOF
# Over-ridden properties:
execution.retries=0
sitedir.keep=true
status.mode=provider
wrapperlog.always.transfer=true
EOF
# execute
swift -config swift.properties -tc.file tc -sites.file sites.xml
psim.itfixex1.swift
exit
More information about the Swift-devel
mailing list