[Swift-devel] IO overheads of swift wrapper scripts on BlueGene/P

Michael Wilde wilde at mcs.anl.gov
Sat Oct 17 12:29:38 CDT 2009


Remember that any situation in which multiple IONs modify the same file 
or directory (ie by creating files or directories in the same parent 
directory) will cause severe contention and performance degradation on 
any GPFS filesystem.

In addition to creating many directories, you need to ensure that no 
single file or directories is likely to ever be written to from multiple 
client nodes (eg IONs on the BG/P) concurrently.

Have you done that in this workload, Allan?

- Mike


On 10/17/09 2:59 AM, Allan Espinosa wrote:
> I was using 1000 files  (or was it 3000?) per directory. it looks like
> i need to lower my ratio...
> 
> -Allan
> 
> 2009/10/16 Mihael Hategan <hategan at mcs.anl.gov>:
>> On Fri, 2009-10-16 at 21:07 -0500, Allan Espinosa wrote:
>>> Progress  2009-10-16 18:00:33.756364000-0500  COPYING_OUTPUTS
>>> Progress  2009-10-16 18:08:19.970449000-0500  RM_JOBDIR
>> Grr. 8 minutes spent COPYING_OUTPUTS.
>>
>> What would be useful is to aggregate all the access that happened on
>> that FS from all the relevant jobs, to see the exact thing that causes
>> contention. I strongly suspect it's
>> home/espinosa/workflows/jgi_blastp/test3.4.7_3cpn.32ifs.192cpu/output/
>>
>> Pretty much all the outputs seem to go to that directory.
>>
>> I'm afraid however that the information in the logs is insufficient.
>> Strace with relevant options (for fs calls only) may be useful if you
>> want to try.
>>
>> Alternatively, you could try to spread your output over multiple
>> directories and see what the difference is.
>>
>> Also, it may be interesting to see the dependence between the delay and
>> the number of contending processes. That is so that we know the limit of
>> how many processes we can allow to compete for a shared resource without
>> causing too much trouble.
>>
>> Mihael
>>
>>
>>
> 
> 
> 



More information about the Swift-devel mailing list