[Swift-devel] Re: started angle-1000 using ci-san data and ext mapper

Michael Wilde wilde at mcs.anl.gov
Wed Nov 7 08:32:24 CST 2007


In IM Ben said:

--
Ben Clifford
its possible to change swift to retry jobs more than 3 times.
i did that with andrew with it up at 10
sometimes jobs were running 5 times or so
it doesn't fix broken nodes but it increases chances
of workflow completion.
--

Sounds good, will try. With this kind of cluster problem, there's little 
else we can do from outside the cluster.


On 11/7/07 8:13 AM, Ben Clifford wrote:
> 
> On Tue, 6 Nov 2007, Michael Wilde wrote:
> 
>> Ive gotten about 10 failures so far from PBS aborts; looks like a node is bad
>> again (sent mail).
> 
> 713 attempts to run jobs worked, 416 failed.  (that's at the execute2 
> level)
> 
> Looks like a combination of file transfer failures and job execution 
> failures.
> 



More information about the Swift-devel mailing list