[Swift-devel] hardlinks instead of copies on local file systems
Ben Clifford
benc at hawaga.org.uk
Mon Apr 14 18:21:17 CDT 2008
I hacked up a quick provider which uses unix hard links instead of copying
in order to transfer files. This is a dirty hack to see if it has any
performance improvements of copying, and lacks error handling. Most
notably, Swift will fail in strange ways when: i) an output file already
exists (other providers tend to overwrite) and ii) when the input data
file is on a different file system (so hard links cannot work) to the site
shared working directory.
To try this out:
i) untar http://www.ci.uchicago.edu/~benc/provider-ln-20080414.tar.gz into
cog/modules/
ii) edit cog/modules/vdsk/dependencies.xml to include a new target
provider-ln (like the existing karajan, provider-localscheduler and
provider-dcache targets).
iii) ant redist in vdsk/
iv) set your sites file to refer to provider-ln, like this:
<pool handle="localhost">
<filesystem provider="ln" />
<execution provider="local" />
<workdirectory >/var/tmp</workdirectory>
</pool>
v) fire!
I've tested this on my laptop. I haven't tested it on GPFS.
I deliberately use hard links rather than symlinks here:
i) when hard linking, the new link is a first order reference to the
file, just like the original. deleting the original link does not delete
the file. this is important for stageout - the output file needs to stay
on the file system, not be deleted with the site working directory.
ii) symlinks require access to the original directory, whilst hardlinks
go straight to the inode without indirecting via the original directory.
this is probably important for GPFS scalability - it means there is one
less filesystem object to interact with when opening the file.
--
More information about the Swift-devel
mailing list