[Swift-devel] hardlinks instead of copies on local file systems

Michael Wilde wilde at mcs.anl.gov
Tue Apr 15 00:27:25 CDT 2008


Excellent! Hope to try later in the week.

- Mike


On 4/14/08 6:21 PM, Ben Clifford wrote:
> I hacked up a quick provider which uses unix hard links instead of copying 
> in order to transfer files. This is a dirty hack to see if it has any 
> performance improvements of copying, and lacks error handling. Most 
> notably, Swift will fail in strange ways when: i) an output file already 
> exists (other providers tend to overwrite) and ii) when the input data 
> file is on a different file system (so hard links cannot work) to the site 
> shared working directory.
> 
> To try this out:
> 
> i) untar http://www.ci.uchicago.edu/~benc/provider-ln-20080414.tar.gz into 
> cog/modules/
> 
> ii) edit cog/modules/vdsk/dependencies.xml to include a new target 
> provider-ln (like the existing karajan, provider-localscheduler and 
> provider-dcache targets).
> 
> iii) ant redist  in vdsk/
> 
> iv) set your sites file to refer to provider-ln, like this:
> 
>   <pool handle="localhost">
>     <filesystem  provider="ln" />
>     <execution provider="local" />
>     <workdirectory >/var/tmp</workdirectory>
>   </pool>
> 
> v) fire!
> 
> I've tested this on my laptop. I haven't tested it on GPFS.
> 
> I deliberately use hard links rather than symlinks here:
> 
>  i) when hard linking, the new link is a first order reference to the 
> file, just like the original. deleting the original link does not delete 
> the file. this is important for stageout - the output file needs to stay 
> on the file system, not be deleted with the site working directory.
> 
>  ii) symlinks require access to the original directory, whilst hardlinks 
> go straight to the inode without indirecting via the original directory. 
> this is probably important for GPFS scalability - it means there is one 
> less filesystem object to interact with when opening the file.
> 



More information about the Swift-devel mailing list