[Swift-devel] several alternatives to design the data management system for Swift on SuperComputers

Mihael Hategan hategan at mcs.anl.gov
Tue Dec 2 10:24:39 CST 2008


On Mon, 2008-12-01 at 23:24 -0600, Zhao Zhang wrote:
> Hi, Mihael
> 
> I think the attached graph could answer your question.

Not really. Is there a test with 8192 pre-created directories?

> 
> All the tests were run 2 racks, 8k cores, with 8192 jobs. Each file 
> created by the test is 1KB.
> 
> 1_DIR_5_FILE means,  all 8192 cores are writing 5 files to 1 dir on 
> GPFS, in this test, within 300 seconds only 31 jobs returned successful.
> 32_DIR_5_FILE , all 8192 cores are writing 5 files to the unique 
> directory for IO node on GPFS. 8192 jobs took 91.026 seconds
> 1000_DIR_5_FILE , all 8192 cores are writing 5 files to 1000 
> hierarchical directories on GPFS. 8192 jobs took 81.555 seconds
> 32_DIR_1_FILE , by batching the 5 output files, each core is wring one 
> tarball to the directory unique for each IO node on GPFS, 8192 jobs took 
> 23.616 seconds
> CIO_5_FILE , with CIO, each core write 5 files to IFS, 8192 jobs took 
> 12.007 seconds.
> 
> 
> Then we could tell 32_DIR_5_FILE doesn't slow down the performance much 
> comparing with
> 1000_DIR_5_FILE. And in this test case, each task is writing 5 files, 
> and in the real case for CIO
> each IO node will write one tar ball at a time. So the performances of 
> the two should be more closer.
> 
> So, in CIO we use a unique directory for one IO node(keep in mind, each 
> IO node has 256 workers).
> For the GPFS test case in the paper, we use the fixed number of 10x1000 
> hierarchical directories for output.
> 
> Does the above thing make the question clear?
> 
> best wishes
> zhangzhao
> 
> Mihael Hategan wrote:
> > On Mon, 2008-12-01 at 21:43 -0600, Ian Foster wrote:
> >   
> >> Dear All:
> >>
> >> b) "Collective I/O": improving performance between intermediate file
> >> system and GPFS by aggregating many small operations into fewer large
> >> operations.
> >>
> >>     
> >
> > This is a part that I'm having trouble understanding.
> >
> > The paper mentions distributing data to different directories (in 6.2.),
> > but not whether the experiment was done with that or not.
> > Are the measurements taken with applications writing data to the same
> > directory or a different directory for each application/node or was the
> > whole thing done with Swift?
> >
> >
> >
> >   




More information about the Swift-devel mailing list