[Swift-devel] Re: scheduler changes to deal with fast-failing sites
lixi at uchicago.edu
lixi at uchicago.edu
Wed Jun 25 10:44:10 CDT 2008
>I'd be interested if anyone (especially Xi) tries this in a
real life
>multi-site situation.
Unfortunately, just now the workflow with 501 jobs failed
due to:
"No status file was found. Check the shared filesystem on
CIT_CMS_T2"
In fact, this is the most frequent error I encountered so
far. I am thinking how to avoid this kind of error for a
long time. I tried to check the remote directory using df
command and make directory, transfer files, etc. These
operations outside of Swift could be done successfully. So I
still wonder how to avoid it, or could we think of adapting
Swift to such sites as CIT_CMS_T2, MIT_CMS, and so on?
>The scoring of well-performing sites is basically the same.
Instead of a
>base of 2 jobs, with more being added according to tscore *
jobThrottle,
>instead a base of 1 job is used. This should not cause much
change in
>behaviour for well-performing sites.
>
>However, the score can now go below 1 for poorly performing
sites. In that
>case, a delay is enforced between submissions to a
particular site. The
>length of that delay increases exponentially as the site
score decreases.
In addition such improvements, as well as filtering out some
sites and giving initial scores which I've done, I am
thinking of other methods these days. Now in Swift, we only
reply on "scores" to determine the performance of sites
which are in turn the only metrics for site selection. Can
we set the different states for sites? For example,
candidate, frozen, etc. "Candidate" just means that we could
select site from them based on their scores/Tscores. If the
site fails, we could designate it as "frozen", at least for
the current job, avoiding more retries would be eaten up. A
frozen site could be unfrozen for satisfying different
conditions, such as an amount of time later, for other new
jobs. Of course, this is some simple ideas which I'm
thinking now. I am going to give more detailed and feasible
process. Any suggestions are warmly welcome.
Thanks,
Xi
More information about the Swift-devel
mailing list