[Swift-devel] Re: scheduler changes to deal with fast-failing sites

lixi at uchicago.edu lixi at uchicago.edu
Fri Jun 27 10:15:46 CDT 2008


>A. Site selection - that has had a number of changes 
recently that are 
>designed to help the multi-site case (replication which 
appeared a month 
>or so ago; and some scoring behaviour changes which went in 
a day or so 
>ago). Prior to these changes, running on a large number of 
sites was 
>almost guaranteed to fail. The new behaviour looks like it 
should be much 
>more successful, though I have yet to hear of anyone (eg 
Xi) trying it on 
>OSG yet.

As far as multi-site is concerned, I've already tried new 
changes. There are several aspects changes I've tested so 
far:
1. With my own calibration results, the sites file generated 
filters some "bad" sites in terms of GRAM and GridFTP. In my 
experiments, this could evidently increased the success rate 
of whole workflow. However, it could not guarantee 
completely successful run of every workflow, because some 
sites produce shared file system error as follows:
Application exception: No status file was found. Check the 
shared filesystem on hostname  
This is the error which I don't know how to check in advance.

2. With replication option enabled, I often 
encountered "Multiple mappings pointing to the same file" 
error which leaded to the failure of the whole workflow. I 
think that I've already reported that error. Since then, I 
didn't receive the message notifying the resolution of this 
problem. So I disabled "replication" option in recent 
experiments.

3. For the latest changes of scoring behaviour, I continue 
testing it. 

Xi



More information about the Swift-devel mailing list