[Swift-devel] Re: Broken pipe on persistent coasters (was Re: Next steps on making the ExTENCI SCEC workflow run reliably)

Allan Espinosa aespinosa at cs.uchicago.edu
Wed May 11 20:02:44 CDT 2011


2011/5/11 Allan Espinosa <aespinosa at cs.uchicago.edu>:

> 2011/5/11 Mihael Hategan <hategan at mcs.anl.gov>:
>> On Wed, 2011-05-11 at 16:42 -0500, Allan Espinosa wrote:
>>> Right. Workers die because they exceed the maximum walltime.  Does the
>>> coaster service expect the workers to die cleanly (passive ones)?
>>
>> Hmm. They aren't expected to die. Which may be a problem.
>>
>> We (as in I) need to change that. Passive workers should advertise their
>> walltime to the service and the service should take that into account so
>> that jobs don't get sent to workers who don't have enough time left.

I remember that previous versions of the worker.pl has an idle timeout
parameter.

>>
>> However, as inefficient as this may be, the service should notify the
>> client that the jobs that were running on a dying worker have failed,
>> and those jobs should be restarted by swift. Is that not happening?
>>

In this case, it hasn't (yet)



More information about the Swift-devel mailing list