<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Hi,<br>

I can see option (1) working as long as there is 1 Swift client and 1

Falkon service. For example, our current deployment on the BG/P would

not work, as we have 1 Swift to <b>many </b>Falkon services. Now,

even the 1-1 Swift-Falkon ratio won't work today, as the Falkon

provider is not data-aware yet... but could be updated, maybe a few

days of coding and testing; the harder part (IMO) will be making sure

that the Swift data management doesn't interfere with the Falkon data

management, and vice versa. <br>

<br>

Option (3) and (4), we have discussed before. The trick with these are

making things general and transparent enough that it works, and works

well. Getting a Torus aggregate throughput to exceed 8GB/s shouldn't be

that hard, with probably a fraction of the machine (several racks). Any

word on the latest numbers of the improved GPFS, which is supposed to

upgrade the number of servers from 8 or 16 up to 100+? With linear

scalability, that would mean 80GB/s, the peak of the SAN throughputs I

saw a while back in some slides from an ALCF talk. For us to get 80GB/s

using CIO, we'd need 2MB/s per node. I bet we can easily achieve that,

but it would probably be at the larger scales of 10s of racks. I recall

getting 100MB/s+ per node, right?  This would give us a theoretical

upper bound of 4000GB/s, so in theory, there is plenty of room between

80GB/s and 4000GB/s. I bet in practice, we'd only get a small fraction

of that 4000GB/s, but it would be interesting how much we can really

get without thinking of the network topology, and also how far we can

get if we do take the network topology into consideration. <br>

<br>

Option (2), I haven't thought of before, but it only works if an output

file is only needed as 1 input file. What do you do if you have 1

output file needed for N input files?  Do you replicate the first job N

times, just so you can get the output file in N locations?  Or do you

group the jobs in 1+N jobs, where the N jobs execute in serial order on

1 processor/node? This might be worth investigating, but I think you'll

be restricting the natural parallelism, or repeating work just to avoid

data management.<br>

<br>

Ioan<br>

<br>

Zhao Zhang wrote:

<blockquote cite="mid:4934621A.5090109@uchicago.edu" type="cite">Hi,

All

  <br>

  <br>

The following alternatives is a summary from a talk between Mike and

Zhao. We are trying

  <br>

to optimize the data IO performance for swift on supercomputers,

includes BGP, Ranger,

  <br>

and possibly Jaguar. We are trying to eliminate all unnecessary data IO

during stages of computation.

  <br>

  <br>

Scenario 1: Say a computation has 2 stages, the 2nd stage would take

the output from the 1st stage

  <br>

as the input data.

  <br>

  <br>

Data Flow in current swift system: 1st stage will write the output data

to GPFS, where swift knows this

  <br>

output data is the input for the 2nd stage. Then send the task to on

worker on CN.

  <br>

  <br>

Desired Data Flow: 1st stage of computation knows the output data will

be used as the input for the next

  <br>

stage, thus the data is not copied back to GPFS, then the 2nd stage

task arrived and consumed this data.

  <br>

  <br>

Key Issue: the 2nd stage task has no idea of where the 1st stage output

data is.

  <br>

  <br>

Design Alternatives:

  <br>

1. Data aware task scheduling:

  <br>

   Both swift and falkon need to be data aware. Swift should know where

the output of 1st stage is, which

  <br>

   means, which pset, or say which falkon service.

  <br>

   And the falkon service should know which CN has the data for the 2nd

stage computation.

  <br>

  <br>

2. Swift patch jobs vertically

  <br>

   Before sending out any jobs, swift knows those 2 stage jobs has data

dependency, thus send out 1 batched

  <br>

   job as 1 to each worker.

  <br>

  <br>

3. Collective IO

  <br>

  Build a shared file system which could be accessed by all CN, instead

of writing output data to GPFS, workers

  <br>

  copy intermediate output data to this shared ram-disk. And retrieve

the data from IFS.

  <br>

  <br>

  Several Concerns:

  <br>

  a) reliability of torus network --- we need to test more about this.

  <br>

  b) performance of torus network --- could this be really performing

better than GPFS? If not, at what scale

  <br>

      could torus perform better than GPFS?

  <br>

  <br>

4. Half-Collective IO

  <br>

  All workers wirte data to IFS, and the data will be periodically

copied back to GPFS. In this case, we only

  <br>

  optimize the output phase, leave the input phase as is.

  <br>

  <br>

Any other ideas? Thanks so much.

  <br>

  <br>

best wishes

  <br>

zhangzhao

  <br>

_______________________________________________

  <br>

Swift-devel mailing list

  <br>

<a class="moz-txt-link-abbreviated" href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a>

  <br>

<a class="moz-txt-link-freetext" href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a>

  <br>

  <br>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

===================================================

Ioan Raicu

Ph.D. Candidate

===================================================

Distributed Systems Laboratory

Computer Science Department

University of Chicago

1100 E. 58th Street, Ryerson Hall

Chicago, IL 60637

===================================================

Email: <a class="moz-txt-link-abbreviated" href="mailto:iraicu@cs.uchicago.edu">iraicu@cs.uchicago.edu</a>

Web:   <a class="moz-txt-link-freetext" href="http://www.cs.uchicago.edu/~iraicu">http://www.cs.uchicago.edu/~iraicu</a>

<a class="moz-txt-link-freetext" href="http://dev.globus.org/wiki/Incubator/Falkon">http://dev.globus.org/wiki/Incubator/Falkon</a>

<a class="moz-txt-link-freetext" href="http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page">http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page</a>

===================================================

===================================================

</pre>

</body>

</html>