[Swift-devel] several alternatives to design the data management system for Swift on SuperComputers

Mihael Hategan hategan at mcs.anl.gov
Mon Dec 1 16:35:19 CST 2008


On Mon, 2008-12-01 at 16:15 -0600, Zhao Zhang wrote:

> Desired Data Flow: 1st stage of computation knows the output data will 
> be used as the input for the next
> stage, thus the data is not copied back to GPFS, then the 2nd stage task 
> arrived and consumed this data.

This assumes a sequential workflow (t1 -> t2 ->... -> tn). For anything
more complex, this becomes a nasty scheduling problem. For example:

(t1, t2) -> t3

The outputs of which of t1 or t2 should not be copied back?

> 
> Key Issue: the 2nd stage task has no idea of where the 1st stage output 
> data is.

I beg to disagree. Swift provides the mechanism to record where data is.
The key issue is that queuing systems don't allow control over the exact
nodes that tasks go to.

Another key issue is that you may not even want to do so, because that
node may be better used running a different task (scheduling problem
again).

> 
> Design Alternatives:
> 1. Data aware task scheduling:
>     Both swift and falkon need to be data aware. Swift should know where 
> the output of 1st stage is, which
>     means, which pset, or say which falkon service.
>     And the falkon service should know which CN has the data for the 2nd 
> stage computation.
> 
> 2. Swift patch jobs vertically
>     Before sending out any jobs, swift knows those 2 stage jobs has data 
> dependency, thus send out 1 batched
>     job as 1 to each worker.
> 
> 3. Collective IO
>    Build a shared file system which could be accessed by all CN, instead 
> of writing output data to GPFS, workers
>    copy intermediate output data to this shared ram-disk. And retrieve 
> the data from IFS.

That seems awfully close to implementing a distributed filesystem, which
I think is a fairly bad idea. If you're trying to avoid GPFS contention,
then avoid it by carefully sticking your data in different directories.
And do keep in mind that most operating systems cache filesystem data in
memory, so a read after write of a reasonably small file will be very
fast with any filesystem.





More information about the Swift-devel mailing list