[Swift-devel] Bringing back the coaster worker timeout feature?

Michael Wilde wilde at mcs.anl.gov
Tue Aug 9 14:31:54 CDT 2011


Related to the idea of adding a worker option for jobsPerNode:

I'd like to propose/discuss adding back the option for workers to time out when they have been idle for some settable period.

This would be useful in configurations like we're running for OSG and TeraGrid, where we may at some points have more workers running than the Swift script has demand for, because of the fairly loose coupling between the script and the worker factory, along with queuing delays, etc.

- Mike

----- Original Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Tuesday, August 9, 2011 2:16:50 PM
> Subject: Re: [Swift-devel] Persistent coasters running one job per worker
> Ah!
> 
> If the workers connect before the client does, then jobsPerNode does
> not
> make it to the coaster service.
> 
> I'll think about this. In the mean time, you could have the workers
> started after the client sends its first job to the service.
> 
> I'm thinking that maybe jobsPerNode should be a setting that the
> workers
> themselves could be started with.
> 
> On Tue, 2011-08-09 at 14:09 -0500, Ketan Maheshwari wrote:
> > I do not see any recent log in ~/.globus/coasters. The stdout/err of
> > the coaster service run is in the attached service.log and the
> > coaster.log is in the attached swift.log.
> >
> >
> >
> >
> > On Tue, Aug 9, 2011 at 1:59 PM, Mihael Hategan <hategan at mcs.anl.gov>
> > wrote:
> >         but but but I checked this, and it worked fine...
> >
> >         Can you also post the coasters log (on the machine the
> >         coaster
> >         service
> >         is on, in ~/.globus/coasters)?
> >
> >
> >         On Tue, 2011-08-09 at 13:47 -0500, Ketan Maheshwari wrote:
> >         > Mihael,
> >         >
> >         >
> >         > I was discussing this with Justin and we thought you could
> >         help:
> >         >
> >         >
> >         > I am observing that persistent coasters are running one
> >         > job
> >         per worker
> >         > as opposed to the number specified in jobspernode (I also
> >         tried
> >         > nodegranularity) on sites.xml.
> >         >
> >         >
> >         > Attaching the log, and the sites.xml for the run. Swift is
> >         0.93 (Swift
> >         > svn swift-r4968 cog-r3225).
> >         >
> >         >
> >         > The script is Mike's catsnsleep that sleeps for 20s with
> >         n=10.
> >         >
> >         > --
> >         > Ketan
> >         >
> >         >
> >         >
> >
> >
> >
> >
> >
> >
> > --
> > Ketan
> >
> >
> >
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list