[Swift-devel] Standard Swift coaster behavior doesnt work wellfor sporadic jobs

jon.monette at gmail.com jon.monette at gmail.com
Wed Oct 20 19:00:55 CDT 2010


Ok. I can try to put together a script that does it. But I think it just need to be a script in which between two jobs that are submitted to a site there is a long time so all the workers time out. 
Sent on the Sprint® Now Network from my BlackBerry®

-----Original Message-----
From: Mihael Hategan <hategan at mcs.anl.gov>
Date: Wed, 20 Oct 2010 16:58:09 
To: Jonathan Monette<jon.monette at gmail.com>
Cc: Michael Wilde<wilde at mcs.anl.gov>; Justin Wozniak<wozniak at mcs.anl.gov>; Swift Devel<swift-devel at ci.uchicago.edu>
Subject: Re: [Swift-devel] Standard Swift coaster behavior doesnt work well
 for sporadic jobs

I haven't fixed it.

Is there a way to reproduce this nicely?

Mihael

On Wed, 2010-10-20 at 10:39 -0500, Jonathan Monette wrote:
> Has this problem been fixed?  I am still experiencing hanging in my 
> scripts and it seems that the jobs are submitted but never executed.  I 
> see that the stage out is finished, the jobs are submitted, and then the 
> coaster heartbeat in the logfile.
> 
> On 10/11/10 8:10 PM, Michael Wilde wrote:
> > ----- "Jonathan Monette"<jon.monette at gmail.com>  wrote:
> >
> >> Wouldn't coasters just re-submit jobs if there are no workers
> >> available to process them?
> > Thats certainly the desired behavior for the default "automatic" mode, but it doesnt appear to be working that way - unless Ive broken it with a local mod.
> >
> > - Mike
> >
> >
> >> My Montage stuff is under the assumption
> >>
> >> that coasters will submit more workers if they all time out.  This
> >> maybe
> >> why my stuff was hanging before.  Not entirely for sure since I am
> >> working on another problem.
> >
> >> On 10/11/2010 03:12 PM, wilde at mcs.anl.gov wrote:
> >>> Mihael, Justin, does the following sound like a likely coaster
> >> issue:
> >>> When using the standard Swift coaster code (not passive or
> >> persistent), if I have a job that runs at the start, and then there is
> >> a long delay before the next job, such that the coaster worker times
> >> out, then the coaster scheduler doesnt think that there is a valid
> >> block into which the job can fit, and Swift just hangs, with the job
> >> in submitted state but never getting assigned to a block.
> >>> The behavior seems similar to what you see when you try to run a job
> >> that doesnt fit into any block that you have defined using the
> >> coasters sites.xml parameters: in that case, too, Swift just hangs.
> >>> Both of these situations (whether or not they are indeed due to the
> >> same algorithmic issue) seem to be problems that we need to address.
> >> In the first case (which is my immediate and more important problem)
> >> you can see a log for the problem in ~wilde/swift-rserver-hangs, along
> >> with the swift script, tc, sites, and properties  file(cf).
> >>> Thanks,
> >>>
> >>> Mike
> >>>
> >> -- 
> >> Jon
> >>
> >> Computers are incredibly fast, accurate, and stupid. Human beings are
> >> incredibly slow, inaccurate, and brilliant. Together they are powerful
> >> beyond imagination.
> >> - Albert Einstein
> 




More information about the Swift-devel mailing list