[Swift-user] Coasters - idle time exceeded

Michael Wilde wilde at mcs.anl.gov
Wed Nov 10 19:47:24 CST 2010


Hi Matthew,

Could you send your swift .log file to us at swift-devel, as well as your sites.xml file, tc.data, and swift.properties (if you have changed them)?

We'll also want to look at $HOME/.globus/coasters.log and any other coaster worker log files (from this run) that might be under .globus/coasters (although the latter is probably not there, as *I think* coasters doesnt write worker logs if there are more than some threshold of total works.

We may need to reproduce this scenario here to debug it.

Mihael may have better suggestions on how to proceed.

- Mike


----- Original Message -----
> Good afternoon,
> 
> While running using Coasters, I occasionally get messages like this:
> 
> Idle time exceeded at /home/username/.globus/coasters/cscript....pl
> line 627.
> 
> Then things go horribly wrong and the processing usually doesn't
> complete.
> 
> At first I thought this was in cases where my workflow had a long tail
> and many workers were left idle as some long running tasks finished up
> -- a symptom of my "let's try this 512-task workflow with 64-128 cores
> and see what happens!" experimentation phase. I got around it by just
> requesting fewer nodes from PBS in my Coasters configuration. But now
> it's popping up on smaller workflows. The susceptible workflows seem
> to be preloaded with less than one node's worth of tasks on the first
> round of dependencies.
> 
> Is there a way that I can increase the idle time limit? Ideally, I'd
> like the coasters to wait for the entire PBS job walltime.
> 
> Matthew
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list