[Swift-devel] several alternatives to design the data management system for Swift on SuperComputers

Tue Dec 2 10:26:31 CST 2008


Mihael Hategan wrote:
> On Mon, 2008-12-01 at 23:24 -0600, Zhao Zhang wrote:
>   
>> Hi, Mihael
>>
>> I think the attached graph could answer your question.
>>     
>
> Not really. Is there a test with 8192 pre-created directories?
>   
nope, why do you think there is 8192 pre-created directories for 2 rack 
test? The case is not one unique dir for each worker, but each dir for 
one IO node
for the CIO test.

zhao
>   
>> All the tests were run 2 racks, 8k cores, with 8192 jobs. Each file 
>> created by the test is 1KB.
>>
>> 1_DIR_5_FILE means,  all 8192 cores are writing 5 files to 1 dir on 
>> GPFS, in this test, within 300 seconds only 31 jobs returned successful.
>> 32_DIR_5_FILE , all 8192 cores are writing 5 files to the unique 
>> directory for IO node on GPFS. 8192 jobs took 91.026 seconds
>> 1000_DIR_5_FILE , all 8192 cores are writing 5 files to 1000 
>> hierarchical directories on GPFS. 8192 jobs took 81.555 seconds
>> 32_DIR_1_FILE , by batching the 5 output files, each core is wring one 
>> tarball to the directory unique for each IO node on GPFS, 8192 jobs took 
>> 23.616 seconds
>> CIO_5_FILE , with CIO, each core write 5 files to IFS, 8192 jobs took 
>> 12.007 seconds.
>>
>>
>> Then we could tell 32_DIR_5_FILE doesn't slow down the performance much 
>> comparing with
>> 1000_DIR_5_FILE. And in this test case, each task is writing 5 files, 
>> and in the real case for CIO
>> each IO node will write one tar ball at a time. So the performances of 
>> the two should be more closer.
>>
>> So, in CIO we use a unique directory for one IO node(keep in mind, each 
>> IO node has 256 workers).
>> For the GPFS test case in the paper, we use the fixed number of 10x1000 
>> hierarchical directories for output.
>>
>> Does the above thing make the question clear?
>>
>> best wishes
>> zhangzhao
>>
>> Mihael Hategan wrote:
>>     
>>> On Mon, 2008-12-01 at 21:43 -0600, Ian Foster wrote:
>>>   
>>>       
>>>> Dear All:
>>>>
>>>> b) "Collective I/O": improving performance between intermediate file
>>>> system and GPFS by aggregating many small operations into fewer large
>>>> operations.
>>>>
>>>>     
>>>>         
>>> This is a part that I'm having trouble understanding.
>>>
>>> The paper mentions distributing data to different directories (in 6.2.),
>>> but not whether the experiment was done with that or not.
>>> Are the measurements taken with applications writing data to the same
>>> directory or a different directory for each application/node or was the
>>> whole thing done with Swift?
>>>
>>>
>>>
>>>   
>>>       
>
>
>