[Swift-devel] Re: worker.pl IDLETIMEOUT

Michael Wilde wilde at mcs.anl.gov
Fri Dec 10 17:16:23 CST 2010


Since your pilot jobs are scripts that launch worker.pl, you could put a timer in those scripts to kill worker.pl and exit cleanly.

If you set maxtime in the pool entry to be somewhat less than the Condor jobtime setting for the pilot job, will Swift, even in the case of persistent coasters, (a) not start a job whose maxwalltime is > than the maxtime remaining, and (b) shut down workers when no queued job has fit into the remaining time of the worker for some idle timeout period? (I.e., I thought the reason IDLETIMEOUT could be removed from the worker was that the client (or the service) has similar logic.

- Mike


----- Original Message -----
> Looking at the worker.pl I use, yes there is no more IDLE timeout
> cases. Then this will leave pilot jobs failing when it exceeds the
> maxwalltime. This is another explanation for the large amount of job
> failures in OSG as well.
> 
> Before the changes, I simply changed the IDLE timeout to exit cleanly
> (exit 0 instead of die)
> 
> -Allan
> 
> 2010/12/10 Michael Wilde <wilde at mcs.anl.gov>:
> > I added that idle timeout arg to worker.pl I think. But in recent
> > changes I think Mihael removed the idle timeout entirely. Are you
> > using a recent trunk version with those changes? That seemed to work
> > best for me in my latest tests using passive persistent coaster
> > servers.
> >
> >
> >
> > ----- Original Message -----
> >> The idle timeout having a non-zero exitcode generated a lot of "JOB
> >> FAILED" stats in OSG . this skews their usage report in a weird
> >> fashion. I made some modifications before but my upgrade to the
> >> latest trunk code somehow broke it.
> >>
> >> 2010/10/12 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> >> > Poking at worker.pl, I see that it accepts a third argument for
> >> > idle
> >> > time. Is
> >> > this in seconds?
> >> >
> >> > Also, I'm using swift to driver a number of passive workers. The
> >> > worker jobs
> >> > fail due to this timeout. I may have to modify things to suit
> >> > this
> >> > kind of
> >> > setup.
> >> >
> >> > Thanks,
> >> > -Allan
> >> >
> >>
> >>
> >> --
> >> Allan M. Espinosa <http://amespinosa.wordpress.com>
> >> PhD student, Computer Science
> >> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> >
> >
> 
> 
> 
> --
> Allan M. Espinosa <http://amespinosa.wordpress.com>
> PhD student, Computer Science
> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list