ࡱ > ^ ` ] [ ' bjbj V j j # l
, - V : / T @- B- B- B- B- B- B- $ s/ 1 D f-
" / f- } {- v+ @- ) v+ J Ġ L
R* v+ - 0 - f* 1 1 v+ Venue-based Shared Data Store
Robert Olson
February 3, 2003
Introduction
In ADDIN EN.CITE Olson20033616Robert Olson2003Virtual Venues Shared Data Store: Architecture and Design Notes[1] we discuss a design for a flexible datastore intended for use as the backing store for the Access Grid. However, there is a place for a simplified interface to a shared data store where some of the requirements are relaxed. We describe such a simplified data store in this document.
Description
The AG 2.0 client user interface provides a simple list of data that is present in the venue.
This data has a quite simple structure:
Single-level list of files; no directory hierarchies
No duplicate names allowed
The user interface provides the following operations:
Doubleclicking on a filename in the Data section causes the file to be downloaded to a local cache directory and invoked with the default system mechanism for that file type
Dragging a file from the UI to a local file browser causes the file to be downloaded and saved to that location.
Right-clicking on the file provides options for opening and downloading.
Data Model
The venue itself stores a set of data descriptors. These descriptors contain metainformation about each individual data object, as well as information on the mechanisms available to transfer the data from its current location to a client interested in actually obtaining the data.
The data itself can either be stored in the venue; that is, in a data storage engine that is collocated with the venue; or, it can be stored in a server external to the venue. Such a server might be a client-based data store, a standard web server, or a special-purpose data device such as a scientific instrument or a large-scale storage device.
File uploads to the data storage engine that is collocated with the server utilize a Venue method call that sets up an upload to the server, handles the reception of the data, and adds a descriptor to the venue for the new data item.
Data Descriptors
A data descriptor contains metainformation about a data object. This information includes the following:
name. The name of the data object. This name is displayed in file listing in the venue, and must be unique among the files in the venue.
status. The current status of the data object. Valid status values are
reference. The data object is not resident in the venue, but is rather a reference to a data object elsewhere. In this case, the Venue RemoveData() call only removes this reference, not the actual file.
present. Data file is present and ready for download.
pending. File name has been reserved and upload is pending.
uploading. File upload is in progress.
invalid. File is not (yet) valid.
size. The size of the file in bytes.
checksum. The MD5 checksum of the file.
owner. The DN of the owner of the file.
location. The location of the file. This is a list of tuples, each of which is a location descriptor (described below).
ACL. The Access Control List for the file (described below).
Data descriptors are passed over the wire as SOAP structs, using the emboldened names above as tag names.
Location Descriptor
A location descriptor defines the mechanisms by which a file or other data item can be obtained by a client. It is a tuple of the following form:
(download-mechanism, download-information)
The download-mechanism is a string defining a particular mechanism by which the data item can be downloaded. We currently define the following mechanisms:
HTTP-GET: the file can be downloaded using the HTTP protocol. In this case, download-information contains the URL to the file.
GASS: the file can be downloaded using the GASS protocol. In this case, download-information is the tuple (file-URL, server-identity), where file-URL is the GASS URL to the file, and server-identity is the DN of the identity the server is using (required for the client to authenticate the server for the transfer).
Access Control List
Each data has an associated access control list.
An access control list is a list of access control entries.
Each access control entry is a tuple (permission, principal-list).
Permission is one of the following permissions:
read: Access allows the user to download the file.
write: Access allows the user to overwrite or delete the file.
Principal-list is a list of principals, which are each either (user, user-identifier) or (role, role-identifier).
Venue API
A client of the Venue receives an initial listing of the data present in the venue when it invokes the Enter method on the Venue service, and is returned a list of data descriptors in the returned structure. The Venue provides the additional methods for manipulating data objects.
AddData(descriptor)
Add descriptor to the set of data in the venue. The name contained in the descriptor must be unique in the Venue, otherwise an exception is thrown.
RemoveData(name)
Remove the data corresponding to name from the venue. If this data was uploaded to the Venue and resides in the Venue data store, the data file will be deleted from the data store.
GetUploadDescriptor()
Return the descriptor to be used to allow the upload of a new file to the Venue. This descriptor defines to the client the upload mechanisms available to the client (GASS, HTTP PUT, etc). This call also prepares the Venue for the upload by the client if necessary.
File Uploads: HTTP
The format of an upload descriptor is a tuple (descriptor-type, descriptor-data).
When the file is actually uploaded, the server will add it to its set file set, and create a descriptor for the file that includes information on the original owner of the file, its size, checksum, etc.
Each data upload is actually two file transfers. The client first transfers a file of metadata about the file. This information is transferred in Python ConfigParser format, and contains section named MANIFEST that contains the following tags:
num_files: the number of files to be transferred
transfer_key: a unique string that links this manifest with the rest of the files in the transfer.
Following the manifest section are one or more sections named with the number of the file that the sections describe. Each section describes one file that the server should expect to receive. These sections have tags named with the fields named in the REF _Ref32304300 \h Data Descriptors section above. An example of such a manifest follows:
[manifest]
num_files = 2
[0]
owner = me
checksum = c1fbb19a9849900c0f25b8eda5dac65a
name = tarfile-0.6.5.zip
size = 61247
[1]
owner = me
checksum = 9795b2df03d6873be815d76cc6fc69ca
name = boot.ini
size = 190
When the server receives the manifest, it will attempt to perform an AddData() operation on the Venue associated with the data store. The descriptors added to the venue are of type pending. If any files cannot be added due to conflicts with an existing filename or authorization failures, the server will return an application error. Otherwise, it will return a transfer key for use in the actual file uploads.
The format of the return from the server is a simple list of keyword/value pairs, one per line:
return_code:
transfer_key:
error_reason:
The return code is 0 for success, nonzero for failure.
The transfer key is a string.
The error_reason is a human-readable string, intended for presentation to the user.
If no error was returned. each actual data file is then transmitted using a single HTTP POST request. The URL that is used is constructed with a path that includes the transfer key and file number:
http://://
Implementation Details
The data descriptors for the data in the venue are stored inline in the data structures in the Venue object (Venue.py), using the accessors provided there.
The Venue-local data storage is implemented by the Python object VenueDataStorage. This object has the following interface:
VenueDataStorage(pathname)
Constructor. Creates a new data storage object, storing the local files in the directory pathname. If pathname does not exist, throw an exception.
GetUploadDescriptor()
Return the upload descriptor for this venue data store.
GetDownloadDescriptor(filename)
Return the download descriptor for this file.
DeleteFile(filename)
Delete filename from the data store.
Corner Cases
In this section we describe corner cases that may not happen often but may complicate the implementation. These are not currently addressed in the implementation; completely eliminating them may require some realignment of the components of this architecture (in particular, the merging of data descriptors with the data storage manager, instead of storing the former in the venue itself).
Two clients uploading the same filename at the same time
Overwriting an existing file: status of that file during the overwrite
ADDIN EN.REFLIST 1. Olson, R., Virtual Venues Shared Data Store: Architecture and Design Notes. 2003.
M N t u x y
3 ; U Y z . B E U ] 0 ? X a i 8 = w + 5 X Y n
5PJ \
6PJ ] PJ 6] 5\ j CJ UmH nH u j U S + < = J * _ z ^ $ =
&F
&