[Swift-user] Swift loops with no explanation when no pending jobs will fit into any possible coaster block
Michael Wilde
wilde at mcs.anl.gov
Thu Mar 31 17:43:24 CDT 2011
I want to point this out to users: if you run a script using coasters as your job execution provider, and you see Swift just saying something like this, even though you know your coasters are running:
RunID: 20110331-1702-3kfa6xa3
Progress:
Progress: Initializing site shared directory:1
Progress: Stage in:1
Progress: Submitted:1
Progress: Submitted:1
then the problem is that your app maxwalltime (likely from tc.data or a default) s larger than the maxtime (after adjustments) of your coaster blocks.
- Mike
----- Forwarded Message -----
From: bugzilla-daemon at mcs.anl.gov
To: swift-devel at ci.uchicago.edu
Sent: Thursday, March 31, 2011 5:34:54 PM
Subject: [Swift-devel] [Bug 287] New: Swift loops with no explanation when no pending jobs will fit into any possible coaster block
https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=287
Summary: Swift loops with no explanation when no pending jobs
will fit into any possible coaster block
Product: Swift
Version: 0.93
Platform: All
OS/Version: All
Status: NEW
Severity: major
Priority: P1
Component: SwiftScript language
AssignedTo: hategan at mcs.anl.gov
ReportedBy: wilde at mcs.anl.gov
CC: hategan at mcs.anl.gov
Example:
tc entry is:
localhost cat /bin/cat null null GLOBUS::maxwalltime="00:05:00"
sites pool is:
<pool handle="localhost">
<execution provider="coaster" url="" jobmanager="local:local"/>
<profile namespace="globus" key="workersPerNode">1</profile>
<profile namespace="globus" key="slots">1</profile>
<profile namespace="globus" key="nodeGranularity">1</profile>
<profile namespace="globus" key="maxNodes">1</profile>
<profile namespace="globus" key="maxtime">120</profile>
<profile namespace="globus" key="lowoverallocation">100</profile>
<profile namespace="globus" key="highoverallocation">100</profile>
<profile namespace="karajan" key="jobThrottle">0.00</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<filesystem provider="local" url="none"/>
<workdirectory>/home/wilde/swiftwork</workdirectory>
</pool>
cat app declares need for 5 mins walltime
only possible coaster slot is 2 mins walltime
so Swift just loops with a job in the queue that never gets run:
RunID: 20110331-1702-3kfa6xa3
Progress:
Progress: Initializing site shared directory:1
Progress: Stage in:1
Progress: Submitted:1
Progress: Submitted:1
User never gets an error like "No coaster slots exist with sufficient time
remaining to run your job.
I think the coaster block times out for inactivity, another one starts, and
nothing gets run, and the user is left in the dark as to why.
--
Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
You are watching the reporter.
_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-user
mailing list