[Swift-devel] Re: scheduler changes to deal with fast-failing sites
lixi at uchicago.edu
lixi at uchicago.edu
Fri Jun 27 10:15:46 CDT 2008
>A. Site selection - that has had a number of changes
recently that are
>designed to help the multi-site case (replication which
appeared a month
>or so ago; and some scoring behaviour changes which went in
a day or so
>ago). Prior to these changes, running on a large number of
sites was
>almost guaranteed to fail. The new behaviour looks like it
should be much
>more successful, though I have yet to hear of anyone (eg
Xi) trying it on
>OSG yet.
As far as multi-site is concerned, I've already tried new
changes. There are several aspects changes I've tested so
far:
1. With my own calibration results, the sites file generated
filters some "bad" sites in terms of GRAM and GridFTP. In my
experiments, this could evidently increased the success rate
of whole workflow. However, it could not guarantee
completely successful run of every workflow, because some
sites produce shared file system error as follows:
Application exception: No status file was found. Check the
shared filesystem on hostname
This is the error which I don't know how to check in advance.
2. With replication option enabled, I often
encountered "Multiple mappings pointing to the same file"
error which leaded to the failure of the whole workflow. I
think that I've already reported that error. Since then, I
didn't receive the message notifying the resolution of this
problem. So I disabled "replication" option in recent
experiments.
3. For the latest changes of scoring behaviour, I continue
testing it.
Xi
More information about the Swift-devel
mailing list