[Swift-devel] How does swift know if a task is successful

Michael Wilde wilde at mcs.anl.gov
Wed Mar 18 09:29:51 CDT 2009


I just reviewed the thread and think I understand the issue now.

To re-iterate (for my benefit):

The status.mode=provider property works but does not do all that Zhao 
needs here: Swift still insists that all the expected app output files 
were placed in the workdirectory (and then copies them back to their 
mapped destination directory).

Zhao is experimenting with a "pull model" where data transfer can be 
done by the compute nodes pulling their input files from where those 
files were left by the previous job, rather than the swift engine 
pushing their input data to the shared work directory.

So, Ben, I think your solution below *might* work.

Zhao, Allan, and I should document the data flow changes that we're 
testing, to help us all discuss this.

On 3/18/09 9:13 AM, Ben Clifford wrote:
> So if Swift could remove the dependency between staging out and starting 
> subsequent jobs (a subset of what has been talked about before), would you 
> still need to hack out the stageout code?
> 
>> To solve this problem, we built a P2P data network on BGP over torus 
>> network. So the basic logic for this is that if a wrapper.sh found a 
>> piece of intermediate data, it registered this data with (name, rank of 
>> the CN) to a Centralized Hash Table(CHT). Next time, when a job needs 
>> this data, first it looks this data up in CHT, gets the rank of the 
>> remote node, convert the RANK to IP, fetch the data directly.
> 
> When we talked in December, I think this bit was done with posix 
> filesystem access. But it sounds like you are doing something different 
> now.
> 
> I've looked at abstracting that worker<->site shared filesystem code in 
> the past (and have some patches floating round in half-written state) - 
> can you send me your modified wrapper.sh so I can see how you do things?
> 



More information about the Swift-devel mailing list