[Swift-devel] hardlinks instead of copies on local file systems

Ben Clifford benc at hawaga.org.uk
Mon Apr 14 18:21:17 CDT 2008


I hacked up a quick provider which uses unix hard links instead of copying 
in order to transfer files. This is a dirty hack to see if it has any 
performance improvements of copying, and lacks error handling. Most 
notably, Swift will fail in strange ways when: i) an output file already 
exists (other providers tend to overwrite) and ii) when the input data 
file is on a different file system (so hard links cannot work) to the site 
shared working directory.

To try this out:

i) untar http://www.ci.uchicago.edu/~benc/provider-ln-20080414.tar.gz into 
cog/modules/

ii) edit cog/modules/vdsk/dependencies.xml to include a new target 
provider-ln (like the existing karajan, provider-localscheduler and 
provider-dcache targets).

iii) ant redist  in vdsk/

iv) set your sites file to refer to provider-ln, like this:

  <pool handle="localhost">
    <filesystem  provider="ln" />
    <execution provider="local" />
    <workdirectory >/var/tmp</workdirectory>
  </pool>

v) fire!

I've tested this on my laptop. I haven't tested it on GPFS.

I deliberately use hard links rather than symlinks here:

 i) when hard linking, the new link is a first order reference to the 
file, just like the original. deleting the original link does not delete 
the file. this is important for stageout - the output file needs to stay 
on the file system, not be deleted with the site working directory.

 ii) symlinks require access to the original directory, whilst hardlinks 
go straight to the inode without indirecting via the original directory. 
this is probably important for GPFS scalability - it means there is one 
less filesystem object to interact with when opening the file.

-- 



More information about the Swift-devel mailing list