[Swift-devel] Re: Broken pipe on persistent coasters (was Re: Next steps on making the ExTENCI SCEC workflow run reliably)

Mihael Hategan hategan at mcs.anl.gov
Thu May 12 12:17:26 CDT 2011


On Wed, 2011-05-11 at 18:59 -0700, Mihael Hategan wrote:

> > 
> > >>
> > >> However, as inefficient as this may be, the service should notify the
> > >> client that the jobs that were running on a dying worker have failed,
> > >> and those jobs should be restarted by swift. Is that not happening?
> > >>
> > 
> > In this case, it hasn't (yet)
> 
> Ok. That's a bug, and I think it's a major bug in your case. Please file
> a bug report on it and I'll get to it as soon as I can.
> 

I see what's happening. In automatic mode a walltime exceeded causes the
task to fail and that is the channel through which the service finds out
that something went wrong.

In passive mode, you manage the task, and the service currently has no
means of learning that a worker has failed.




More information about the Swift-devel mailing list