[Swift-user] Coasters - idle time exceeded
Mihael Hategan
hategan at mcs.anl.gov
Wed Nov 10 21:06:51 CST 2010
There is a way to increase that limit. That parameter also seems to be a
command line argument, though I don't see it used in that way.
In any event, look for "my $IDLETIMEOUT" in
provider-coaster/resources/worker.pl and change the default there (4 *
60) to whatever you want (I suggest "very large number"). Then
re-compile and re-run.
The idle time was used in a previous version of the coasters (when there
was no block allocation) as a mechanism to clean up unused workers. This
is now done by the coaster service itself.
The problem with letting the workers do this is that they have no
knowledge that they are part of a block. In said previous version, a
worker dying would be seen immediately by the service through the fact
that the worker job ended. This is not the case with the current block
scheme in which workers are part of multi-node jobs.
The advantage of letting the workers do this is that it is simple
algorithmically.
So given the above, I'd be in favor of getting rid of this idle timeout.
The only concern remaining is preventing workers from running when the
coaster service has died. However, the heartbeat mechanism should take
care of that.
Opinions?
Mihael
On Wed, 2010-11-10 at 15:41 -0700, Matthew Woitaszek wrote:
> Good afternoon,
>
> While running using Coasters, I occasionally get messages like this:
>
> Idle time exceeded at /home/username/.globus/coasters/cscript....pl line 627.
>
> Then things go horribly wrong and the processing usually doesn't complete.
>
> At first I thought this was in cases where my workflow had a long tail
> and many workers were left idle as some long running tasks finished up
> -- a symptom of my "let's try this 512-task workflow with 64-128 cores
> and see what happens!" experimentation phase. I got around it by just
> requesting fewer nodes from PBS in my Coasters configuration. But now
> it's popping up on smaller workflows. The susceptible workflows seem
> to be preloaded with less than one node's worth of tasks on the first
> round of dependencies.
>
> Is there a way that I can increase the idle time limit? Ideally, I'd
> like the coasters to wait for the entire PBS job walltime.
>
> Matthew
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
More information about the Swift-user
mailing list