[Swift-devel] Standard Swift coaster behavior doesnt work well for sporadic jobs
Mihael Hategan
hategan at mcs.anl.gov
Wed Oct 20 18:58:09 CDT 2010
I haven't fixed it.
Is there a way to reproduce this nicely?
Mihael
On Wed, 2010-10-20 at 10:39 -0500, Jonathan Monette wrote:
> Has this problem been fixed? I am still experiencing hanging in my
> scripts and it seems that the jobs are submitted but never executed. I
> see that the stage out is finished, the jobs are submitted, and then the
> coaster heartbeat in the logfile.
>
> On 10/11/10 8:10 PM, Michael Wilde wrote:
> > ----- "Jonathan Monette"<jon.monette at gmail.com> wrote:
> >
> >> Wouldn't coasters just re-submit jobs if there are no workers
> >> available to process them?
> > Thats certainly the desired behavior for the default "automatic" mode, but it doesnt appear to be working that way - unless Ive broken it with a local mod.
> >
> > - Mike
> >
> >
> >> My Montage stuff is under the assumption
> >>
> >> that coasters will submit more workers if they all time out. This
> >> maybe
> >> why my stuff was hanging before. Not entirely for sure since I am
> >> working on another problem.
> >
> >> On 10/11/2010 03:12 PM, wilde at mcs.anl.gov wrote:
> >>> Mihael, Justin, does the following sound like a likely coaster
> >> issue:
> >>> When using the standard Swift coaster code (not passive or
> >> persistent), if I have a job that runs at the start, and then there is
> >> a long delay before the next job, such that the coaster worker times
> >> out, then the coaster scheduler doesnt think that there is a valid
> >> block into which the job can fit, and Swift just hangs, with the job
> >> in submitted state but never getting assigned to a block.
> >>> The behavior seems similar to what you see when you try to run a job
> >> that doesnt fit into any block that you have defined using the
> >> coasters sites.xml parameters: in that case, too, Swift just hangs.
> >>> Both of these situations (whether or not they are indeed due to the
> >> same algorithmic issue) seem to be problems that we need to address.
> >> In the first case (which is my immediate and more important problem)
> >> you can see a log for the problem in ~wilde/swift-rserver-hangs, along
> >> with the swift script, tc, sites, and properties file(cf).
> >>> Thanks,
> >>>
> >>> Mike
> >>>
> >> --
> >> Jon
> >>
> >> Computers are incredibly fast, accurate, and stupid. Human beings are
> >> incredibly slow, inaccurate, and brilliant. Together they are powerful
> >> beyond imagination.
> >> - Albert Einstein
>
More information about the Swift-devel
mailing list