Datastore architecture / design document

Thu Dec 19 20:10:58 CST 2002

Wow, this is great. Is there a possibility the python is already in cvs?

--Ivan, the hopeful :-)

> -----Original Message-----
> From: owner-ag-dev at mcs.anl.gov
> [mailto:owner-ag-dev at mcs.anl.gov] On Behalf Of Robert Olson
> Sent: Thursday, December 19, 2002 1:39 PM
> To: ag-dev at mcs.anl.gov
> Subject: Datastore architecture / design document
>
>
>
> User operations that result in interactions with the data store:
>
> (A) User enters venue. His client fills up with a list of
> files and directories available in the venue.
>
> (B) User doubleclicks on a file. The file is downloaded and
> the appropriate application is launched on his computer.
>
> (C) User drags a file from his desktop into the file share.
> The file is copied to the venue and made available. The list
> of files in the venue updates with that new file.
>
> (D) User brings up file properties window for a file. It
> shows who created the file, when it was uploaded, and any
> access properties on it. User renames the file.
>
> (E) User wants to add a file or directory to his local
> exported filestore. He drags the file or directory into the
> transient files section in the client GUI.
>
> ----
>
> Extended discussion.
>
> (A) Discovery of files.
>
> The venue description returned from the Enter() operation
> includes a set of data item descriptions that describe the
> data objects present in the venue and in all of the clients'
> transient data stores (1).
>
> This description includes entries for both files and
> directories; that is, the data server supports arbitrary
> directory trees.
>
> Alternatives for descriptors:
>
>     o full (relative) pathname in each
>       directories given distinct identities (inode); each file
>       has a reference to its directory.
>
> These two aren't actually that different; if you consider the
> pathname to the directory as the unique ID the two approaches
> are similar.
>
> Each file descriptor will look something like this:
>
>       name: name of this file
>       directory: full path to containing directory
>       owner: DN
>       size: size in bytes
>       upload_time: date/time
>       acl: acl for access to file
>       transfer_spec: information required to download this file
>
> The transfer spec contains the information required for a
> client to download the file. This will likely be (for
> GASS-based transfers) the URL from which the file can be
> obtained along with the DN of the identity of the server
> holding the file.
>
> Each directory descriptor will look like this:
>
>       pathname: full path to directory
>       owner: DN
>       acl: acl for access to directory
>       upload_transfer_spec: information required for upload
> of file to dir
>
> The aggregate description as sent to the client is a
> depth-first traversal of the directory structure (so that the
> client always has knowledge of directory before it receives
> the list of files for that directory).
>
> Alternatively, the interface could be directory based. Given
> a directory name, the server returns the list of files and
> subdirectories in that directory.
>
> Data store operations to support this functionality:
>
> RetrieveDirectory(path) => ([files], [directories])
>
>      Retrieve the contents of the directory rooted at <path>
>      Return a tuple containing the list of files in that directory
>      and a list of subdirectories of that directory.
>
>      Each entry in the file and directory lists is a descriptor as
>      described above.
>
> Additional notes:
>
> If files or directories are added, a notification can be sent
> asynchronously to clients who have registered for these
> notifications. The information in the notification can
> contain the descriptor for the file or directory; these
> descriptors contain all the information necessary for the
> client to make use of the information.
>
> Footnotes:
>
> (1) This is the clients-advertise-all-data model. The alternative is
>      to require clients to query all other clients for their transient
>      data. This has issues with latency and with the possibility that
>      inbound connections to clients may be forbidden by site
> firewalls.
>
> ---
>
>
> (B) File Transfer, Venue to User
>
> Given the transfer spec in the file description, this is
> straightforward. It's likely just an HTTP GET or a FTP operation.
>
> ---
>
> (C) Desktop upload.
>
> The user has specified (perhaps implicitly via the GUI) the
> directory into which the file should be uploaded. The file
> can be either pushed to the server from the client or pulled
> from the client to the server. The interaction with the
> server may be simpler with a client pull, but site-local
> firewall rules may forbid connections incoming to a client. A
> server pull also requires the client to act as a server.
>
> Hence, we choose to first define a client-push based
> mechanism. In the upload_transfer_spec for a directory the
> client will find the information required to effect a
> transfer. This may be a URL to which a GASS-based HTTP PUT
> operation can be performed, or perhaps a FTP url to which a
> put is allowed.
>
> ---
>
> (D) File properties and directory operations.
>
> The client, upon receiving the file description, has the
> metainformation about the file available for display.
>
> Directory operations, such as renaming, moving, and deletion
> are provided as a family of operations on the datastore service:
>
> FileRename(oldname, newname) => descriptor for newly named file
>     Rename the given file.
>
> FileDelete(name) => success/failure
>     Delete the given file.
>
> DirCreate(full path) => descriptor for directory
>     Create a new directory.
>
> DirRename(oldpath, newpath) => descriptor for directory plus
> updated descriptors for all files that have new names. (2)
>     Rename a directory.
>
> DirDelete(path)
>     Delete a directory. This results in all files below that directory
>     also being deleted.
>
> (2) This argues actually for giving files unique IDs and having the
>      directory information just be advisory.
>
> ---
>
> (E) Transient filestores.
>
> A transient filestore is one that is provided from a user's
> personal machine; that is, it is not a persistent Venue resource.
>
> A transient filestore uses the same core filestore engine
> that the venue filestore does; the primary difference lies in
> the mechanism by which the filestore and its contents are
> discovered. Whereas the location of the Venue's filestore is
> found in teh description of the Venue itself, the location of
> a transient filestore is an attribute of the description of
> the user who is hosting that transient filestore.
>
> All the same operations apply - file discovery, transfer,
> renaming, etc. A transient filestore, however, is likely to
> have much different access control policies. For instance, it
> is likely that a user would not allow others en masse to have
> the ability to transfer files to his machine, or to delete or
> rename files resident there.
>
> ----------------
>
> Data store implementation notes.
>
> The hierarchical directory of files as presented by the
> datastore API may or may not be bound to an actual
> hierarchical directory of files. For a Venue datastore, it
> may be reasonable for that to be the case. For a transient
> user-based datastore, it may be reasonable to present that
> view, while the files that are being exported in this manner
> actually reside at varied places on the user's filesystem
> (having arrived in the datastore by being dragged as needed
> into the datastore's user interface).
>
> The Python implementation of the datastore is split into two main
> objects: a DataStore which provides the hierarchical file
> storage abstraction with the internal bookkeepping of
> matching virtual filename space to physical files, and a
> TransferEngine which provides the functionality required for
> the actual upload and download of files.
>
> The DataStore relies on the TransferEngine to provide it with
> the transfer_spec portion of the file description, for any given file.
>
> The DataStore API closely follows the API described in (A) -
> (D) above; however, this API is a Python object API rather
> than a web services API.
>
>