[Swift-devel] discussion for swift output file path

Mihael Hategan hategan at mcs.anl.gov
Thu Nov 6 18:59:54 CST 2008


I spoke about this with Zhao a few days ago. Regardless of the local
configuration (which is likely to happen on a fast, non-distributed FS),
it may make sense to employ, for application/shared data, a similar
scheme to that used for info files.

It may also make sense to coalesce all the trees into one, such that
data, info, and temporary job dirs are created in the same directory
(e.g. (info/a/0, job/a/0) becomes (a/0/info, a/0/job)), since it will
reduce the overall number of operations.

On Fri, 2008-11-07 at 00:51 +0000, Ben Clifford wrote:
> The wrapper does not necessarily put all output files in one directory.
> 
> It puts them in a directory structure reflecting the submit-side directory 
> structure. For example, if you map a file "a/b", it will end up in the 
> shared directory as shared/a/b
> 
> This is exposed to the user, though, rather than being hidden as with the 
> other uses of subdirectories.
> 
> There are a few places throughout the source code where its assumed that 
> the path locally and remote path are basically the same (modulo base 
> directory). I think its probably fairly straightforward to make it use 
> different names, thoughLook for lines in vdl-int.k that look like this:
> 
>     task:transfer(srcprovider=provider, 
> srchost=srchost, srcfile=filename, 
>                                                 srcdir=srcdir, 
> desthost=host, destdir=destdir)
>                                         )
> 
> 
> You can put a different destdir in there, based on (for example) some hash 
> of the filename. Simialrly hash the filenames when they are passed as 
> inputs to wrapper.sh in the vdl:execute line in vdl-int.k
> 
> On Thu, 6 Nov 2008, Zhao Zhang wrote:
> 
> > Hi, All
> > 
> > I am working on integrate the Collective IO system and swift on BGP. Before
> > that, for the purpose of put swift into production work,
> > we need to change the output file path. For now, wrapper.sh would copy all
> > output files to jobdir/shared/, on BGP, all output files will
> > be written to one directory, which I am sure will cause the GPFS lock
> > mechanism, thus introduce unacceptable latency.
> > 
> > So the easiest way to fix this is "make a hierarchical directory in shared/
> > and we already did in info/ and jobs/". Several changes we need:
> >     place-need-change
> > diffculty
> > 1. change vdl-int.k: create hierarchical directory in shared/,
> > straightforward
> > 2. change wrapper.sh: copy files from local ramdisck to GPFS using dd instead
> > of cp,      straghtforward.
> > 3. change somewhere in swift to make swift know where the data is, the path of
> > the output file in jobdir/shared/    unknown
> > 
> > Any comments will be appreciated.
> > 
> > best wishes
> > zhangzhao
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> > 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list