[Swift-devel] Imbalanced scheduling with coasters and multiple sites

Mihael Hategan hategan at mcs.anl.gov
Tue Apr 7 00:39:14 CDT 2009


On Tue, 2009-04-07 at 00:33 -0500, Michael Wilde wrote:
> 
> On 4/7/09 12:26 AM, Mihael Hategan wrote:
> > On Tue, 2009-04-07 at 00:15 -0500, Michael Wilde wrote:
> >> Note on below: I used 2hr30min as the time to match Glen's time, for the 
> >> runs in which he first saw the "imbalance".
> >>
> >> In my first tests,I had used 5 min for coasterWorkerMaxwalltime and 
> >> specified no site or tc maxwalltime. I thought that would work, based on 
> >> our earlier lengthy exchanges on this topic. But apparantly coasters was 
> >> calculating some default max walltime for "cat" and it gave me an error 
> >> about insufficient time.
> > 
> > Right. Previously it would just loop starting workers and then not using
> > them because they didn't have enough time. The default walltime is 10
> > minutes.
> 
> That makes sense then. The error I got was:
> 
> 2009-04-06 20:52:35,397-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION 
> jobid=cat-e3agg19j - Application exception: Job cannot be run with the 
> given max walltime worker constraint
> 
> The other few anomalies I saw I will ignore unless they happen again, as 
> I was using the bad 3/31 revision. This was things like starting a new 
> service with some strange default max time ("01:41:00" or 101 minutes) 

Not strange. 101 = 10 * 10 + 1 or DEFAULT_MAXWALLTIME *
OVERALLOCATION_FACTOR + RESERVE.

> after the initial services were started with the correct time, and some 
> strange error retry behavior.
> 
> Bear with me - these things are very difficult and tedious to report.

No problem. I'm glad you're exercising the code.




More information about the Swift-devel mailing list