[Swift-devel] Shutting down BG/P jobs after swift script completes

Michael Wilde wilde at mcs.anl.gov
Tue Nov 17 23:08:00 CST 2009


Mihael, it seems like jobs linger, and new jobs start, after a swift 
script completes (on surveyor with coasters)

Info below.

- Mike


I saw this in qstat:

sur$ qstat
JobID   User   WallTime  Nodes  State   Location
==================================================
137824  wilde  00:29:00  64     queued  None
sur$ qstat
JobID   User   WallTime  Nodes  State    Location
============================================================
137824  wilde  00:29:00  64     running  ANL-R00-M0-N12-64

After the script completed, I saw this:


sur$ qstat
JobID   User   WallTime  Nodes  State     Location
=============================================================
137824  wilde  00:29:00  64     running   ANL-R00-M0-N12-64
137825  wilde  00:10:00  1      starting  ANL-R00-M0-N14-64
sur$

sur$ qstat
JobID   User   WallTime  Nodes  State    Location
============================================================
137824  wilde  00:29:00  64     running  ANL-R00-M0-N12-64
137825  wilde  00:10:00  1      running  ANL-R00-M0-N14-64


---

for this script activity:

sur$ run.itfixex1.sh
Running from host with compute-node reachable address of 172.17.3.16
Running in /home/wilde/protests/run.itfix.49
protlib2 home is /home/wilde/protlib2
Swift svn swift-r3190 cog-r2605

RunID: 20091117-2257-3hsazpy8
Progress:
Progress:  Checking status:1
Progress:  Submitting:3  Submitted:1  Finished successfully:1
Progress:  Submitted:4  Finished successfully:1
Progress:  Submitted:4  Finished successfully:1
Progress:  Submitted:4  Finished successfully:1
Progress:  Submitted:4  Finished successfully:1
Progress:  Submitted:3  Active:1  Finished successfully:1
Progress:  Active:4  Finished successfully:1
Progress:  Active:4  Finished successfully:1
Progress:  Active:3  Checking status:1  Finished successfully:1
Progress:  Checking status:1  Finished successfully:5
Progress:  Active:4  Finished successfully:6
Progress:  Active:3  Checking status:1  Finished successfully:6
Progress:  Submitting:1  Finished successfully:10
Progress:  Active:1  Finished successfully:10
Progress:  Checking status:1  Finished successfully:10
Final status:  Finished successfully:11
Cleaning up...
Shutting down service at https://172.17.3.16:50002
Got channel MetaChannel: 177867418 -> null
+ Done
sur$

---

With these settings:

cat >tc <<EOF # ensure that whitespace here is TABS!!!
null		PSim		$p2home/bin/PSim.sh	null	null	null
surveyor	PSim		$p2home/bin/PSim.sh	null	null	null
localhost	ItFixInit	$p2home/bin/ItFixInit.sh	null	null	null
localhost	RevisePData	$p2home/bin/RevisePData.sh	null	null	null
EOF

cat >sites.xml <<EOF

<config>
   <pool handle="localhost">
     <filesystem provider="local"/>
     <execution provider="local"/>
     <workdirectory>$rundir</workdirectory>
     <profile namespace="karajan" key="jobThrottle">0.01</profile>
     <profile namespace="karajan" key="initialScore">10000</profile>
   </pool>

   <pool handle="surveyor">
     <filesystem provider="local"/>
     <execution provider="coaster" jobmanager="local:cobalt"/>
     <profile namespace="globus" key="slots">1</profile>
     <profile namespace="globus" key="nodeGranularity">64</profile>
     <profile namespace="globus" key="workersPerNode">4</profile>
     <profile namespace="globus" key="maxNodes">64</profile>
     <profile namespace="globus" key="project">JGI-Pilot</profile>
     <profile namespace="globus" key="kernelprofile">zeptoos</profile>
     <profile namespace="globus" key="maxtime">1200</profile>
     <profile namespace="globus" key="alcfbgpnat">true</profile>
     <profile namespace="karajan" key="jobThrottle">2.55</profile>
     <profile namespace="karajan" key="initialScore">100000</profile>
     <workdirectory >$rundir</workdirectory>
   </pool>

</config>

EOF
#    Put this back in for performance
#    <scratch>/scratch</scratch>

# Copy in swift script and mappers

cp 
$p2home/swift/{psim.itfixex1.swift,swift.properties,Protein.map,ItFixProtein.map,ItFixProtSim.map,plist2} 
.

swiftdir=$(dirname $(dirname $(which swift)))
cp $swiftdir/etc/swift.properties .

cat >>$HOME/.swift/swift.properties <<EOF

# Over-ridden properties:

execution.retries=0
sitedir.keep=true
status.mode=provider
wrapperlog.always.transfer=true
EOF

# execute

swift -config swift.properties -tc.file tc -sites.file sites.xml 
psim.itfixex1.swift

exit





More information about the Swift-devel mailing list