[Swift-user] [Swift]Cancelling Job Issues
wilde at mcs.anl.gov
Fri Jun 10 08:18:15 CDT 2011
Wei, is this happening with the coaster provider? If so its possible that coasters is canceling workers when their walltime remaining is not large enough to execute any ready job (each of which has either an explicit maxwalltime of a default of I think 10 minutes). I don't *think* its canceling jobs - I suspect its cancelling the *worker* job (i.e., the "pilot" job).
Please send us a pointer to your script, configuration files, and log files. Also please be sure to save stdout and stderr to a file for each run.
----- Original Message -----
In recent several runs using SWIFT I've notice a frequent problem (which does not occur each time) when SWIFT scheduler is canceling the last active job.
I'm using PADS and SWIFT-0.92.1, previous stable also share this problem.
I submitted hundreds to thousands of jobs at a time, the job are finished well until it goes to the very last step -- the time point I have no "Selecting Site" "Submitted" or "Stage In" status jobs, so SWIFT's status the the screen is like
"Active: 24 Stage Out: xxx Finished Successfully: xxx Failed But Can Retry: xxx"
at this step the last job is cancelled despite there's still remaining Active jobs.
Swift-user mailing list
Swift-user at ci.uchicago.edu
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Swift-user