[Swift-devel] hardlinks instead of copies on local file systems
Michael Wilde
wilde at mcs.anl.gov
Tue Apr 15 00:27:25 CDT 2008
Excellent! Hope to try later in the week.
- Mike
On 4/14/08 6:21 PM, Ben Clifford wrote:
> I hacked up a quick provider which uses unix hard links instead of copying
> in order to transfer files. This is a dirty hack to see if it has any
> performance improvements of copying, and lacks error handling. Most
> notably, Swift will fail in strange ways when: i) an output file already
> exists (other providers tend to overwrite) and ii) when the input data
> file is on a different file system (so hard links cannot work) to the site
> shared working directory.
>
> To try this out:
>
> i) untar http://www.ci.uchicago.edu/~benc/provider-ln-20080414.tar.gz into
> cog/modules/
>
> ii) edit cog/modules/vdsk/dependencies.xml to include a new target
> provider-ln (like the existing karajan, provider-localscheduler and
> provider-dcache targets).
>
> iii) ant redist in vdsk/
>
> iv) set your sites file to refer to provider-ln, like this:
>
> <pool handle="localhost">
> <filesystem provider="ln" />
> <execution provider="local" />
> <workdirectory >/var/tmp</workdirectory>
> </pool>
>
> v) fire!
>
> I've tested this on my laptop. I haven't tested it on GPFS.
>
> I deliberately use hard links rather than symlinks here:
>
> i) when hard linking, the new link is a first order reference to the
> file, just like the original. deleting the original link does not delete
> the file. this is important for stageout - the output file needs to stay
> on the file system, not be deleted with the site working directory.
>
> ii) symlinks require access to the original directory, whilst hardlinks
> go straight to the inode without indirecting via the original directory.
> this is probably important for GPFS scalability - it means there is one
> less filesystem object to interact with when opening the file.
>
More information about the Swift-devel
mailing list