[Swift-user] Swift loops with no explanation when no pending jobs will fit into any possible coaster block

Michael Wilde wilde at mcs.anl.gov
Thu Mar 31 17:43:24 CDT 2011


I want to point this out to users: if you run a script using coasters as your job execution provider, and you see Swift just saying something like this, even though you know your coasters are running:

RunID: 20110331-1702-3kfa6xa3
Progress:
Progress:  Initializing site shared directory:1
Progress:  Stage in:1
Progress:  Submitted:1
Progress:  Submitted:1

then the problem is that your app maxwalltime (likely from tc.data or a default) s larger than the maxtime (after adjustments) of your coaster blocks.

- Mike

----- Forwarded Message -----
From: bugzilla-daemon at mcs.anl.gov
To: swift-devel at ci.uchicago.edu
Sent: Thursday, March 31, 2011 5:34:54 PM
Subject: [Swift-devel] [Bug 287] New: Swift loops with no explanation when no pending jobs will fit into any possible coaster block

https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=287

           Summary: Swift loops with no explanation when no pending jobs
                    will fit into any possible coaster block
           Product: Swift
           Version: 0.93
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P1
         Component: SwiftScript language
        AssignedTo: hategan at mcs.anl.gov
        ReportedBy: wilde at mcs.anl.gov
                CC: hategan at mcs.anl.gov


Example:

tc entry is:

localhost cat /bin/cat null null  GLOBUS::maxwalltime="00:05:00"

sites pool is:

  <pool handle="localhost">
    <execution provider="coaster" url="" jobmanager="local:local"/>

    <profile namespace="globus" key="workersPerNode">1</profile>
    <profile namespace="globus" key="slots">1</profile>
    <profile namespace="globus" key="nodeGranularity">1</profile>
    <profile namespace="globus" key="maxNodes">1</profile>

    <profile namespace="globus" key="maxtime">120</profile>
    <profile namespace="globus" key="lowoverallocation">100</profile>
    <profile namespace="globus" key="highoverallocation">100</profile>

    <profile namespace="karajan" key="jobThrottle">0.00</profile>
    <profile namespace="karajan" key="initialScore">10000</profile>

    <filesystem provider="local" url="none"/>
    <workdirectory>/home/wilde/swiftwork</workdirectory>
  </pool>

cat app declares need for 5 mins walltime

only possible coaster slot is 2 mins walltime

so Swift just loops with a job in the queue that never gets run:


RunID: 20110331-1702-3kfa6xa3
Progress:
Progress:  Initializing site shared directory:1
Progress:  Stage in:1
Progress:  Submitted:1
Progress:  Submitted:1

User never gets an error like "No coaster slots exist with sufficient time
remaining to run your job.

I think the coaster block times out for inactivity, another one starts, and
nothing gets run, and the user is left in the dark as to why.

-- 
Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
You are watching the reporter.
_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list