[Swift-user] Question on coaster job time calculations

wilde at mcs.anl.gov wilde at mcs.anl.gov
Wed May 5 14:07:48 CDT 2010


I tried to work around this general problem of having a script with many sub-hour jobs followed by many multi-hour jobs, and wanting to get specific sized jobs into specific queues.

What I did was marked each app in tc.data with an appropriate maxwalltime, and created 4 pool entries in sites.xml for the same coaster site, with maxtime of 1,2,3, and 4 hours. Each app was associated in tc.data with one of these pools: all the short jobs (the first 305 jobs of the script) were marked as being on site pbs1 (for 1-hour jobs), and the longer jobs, given 2h:55m walltime, were sent to pbs4.

The first stage of the script ran fine, and jobs were sent to the fast queue in 1-hour blocks, just as desired.

However, the second stage behaved as if coasters could not find a suitable block allocation for them, and the script then failed to progress: no PBS jobs were queued, and the log showed that the coaster scheduler was idling (same I think as would happen if the remaining jobs all had maxwalltime greater than maxtime of the pool).

Mihael, can you check the log, and see if this is a config bug on my part or a code bug?

It looks a bit like coasters is trying to fit my 2hr:55 jobs into a 60 minute coaster slot of pbs1, unaware that the 4hr slots of pbs4 should be started????
 
The logs, scripts, and config files are in:

/home/wilde/protests/T0517/run.raptorloops.5732

Thanks,

Mike

 
----- wilde at mcs.anl.gov wrote:

> In a Swift run with coasters, I set the sites element "maxTime" to 3
> hours to accommodate the longest jobs that the script runs.  But the
> script starts by running a single pre-processing job that is set to
> maxWallTime 30 mins.
> 
> I would have expected this single job to get placed in a coaster PBS
> job set to a wallTime closer to 30 mins, but the PBS walltime was set
> to 90 mins (causing the job to wait in the short queue rather than
> start right away in the fast queue; I have the sites "queue" element
> set to "route" which selects the best queue based on PBS walltime).
> 
> Why is this?
> 
> All the config files are on the CI net in:
> 
>   /home/wilde/protests/T0517/run.raptorloops.2260
> 
> log is in:
> 
>   RaptorLoops-20100505-0801-1dtuj463.log
> 
> Mike
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list