[Swift-devel] Re: started angle-1000 using ci-san data and ext mapper
Michael Wilde
wilde at mcs.anl.gov
Wed Nov 7 08:32:24 CST 2007
In IM Ben said:
--
Ben Clifford
its possible to change swift to retry jobs more than 3 times.
i did that with andrew with it up at 10
sometimes jobs were running 5 times or so
it doesn't fix broken nodes but it increases chances
of workflow completion.
--
Sounds good, will try. With this kind of cluster problem, there's little
else we can do from outside the cluster.
On 11/7/07 8:13 AM, Ben Clifford wrote:
>
> On Tue, 6 Nov 2007, Michael Wilde wrote:
>
>> Ive gotten about 10 failures so far from PBS aborts; looks like a node is bad
>> again (sent mail).
>
> 713 attempts to run jobs worked, 416 failed. (that's at the execute2
> level)
>
> Looks like a combination of file transfer failures and job execution
> failures.
>
More information about the Swift-devel
mailing list