[Swift-devel] Standard Swift coaster behavior doesnt work well for sporadic jobs

wilde at mcs.anl.gov wilde at mcs.anl.gov
Mon Oct 11 15:12:14 CDT 2010


Mihael, Justin, does the following sound like a likely coaster issue:

When using the standard Swift coaster code (not passive or persistent), if I have a job that runs at the start, and then there is a long delay before the next job, such that the coaster worker times out, then the coaster scheduler doesnt think that there is a valid block into which the job can fit, and Swift just hangs, with the job in submitted state but never getting assigned to a block.

The behavior seems similar to what you see when you try to run a job that doesnt fit into any block that you have defined using the coasters sites.xml parameters: in that case, too, Swift just hangs.

Both of these situations (whether or not they are indeed due to the same algorithmic issue) seem to be problems that we need to address. In the first case (which is my immediate and more important problem) you can see a log for the problem in ~wilde/swift-rserver-hangs, along with the swift script, tc, sites, and properties  file(cf).

Thanks,

Mike

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list