Document Service Introduction

Zend_Cloud_DocumentService abstracts the interfaces to all major document databases - both in the cloud and locally deployed - so developers can access their common functionality through one API. In other words, an application can make use of these databases and services with no concern over how the application will be deployed. The data source can be chosen through configuration changes alone at the time of deployment.

Document databases and services are increasingly common in application development. These data sources are somewhat different from traditional relational data sources, as they eschew complex relationships for performance, scalability, and flexibility. Examples of document-oriented services include Amazon SimpleDB and Azure Table Storage.

The Simple Cloud API offers some flexibility for vendor-specific features with an $options array in each method signature. Some adapters require certain options that also must be added to the $options array. It is a good practice to retrieve these options from a configuration file to maintain compatibility with all services and databases; unrecognized options will simply be discarded, making it possible to use different services based on environment.

If more vendor-specific requirements are required, the developer should extend the specific Zend_Cloud_DocumentService adapter to add support for these features. In this manner, vendor-specific features can be called out in the application by referring to the Simple Cloud API extensions in the subclass of the Simple Cloud adapter.

Zend_Cloud_DocumentService_Adapter Interface

The Zend_Cloud_DocumentService_Adapter interface defines methods that each concrete document service adapter implements. The following adapters are shipped with the Simple Cloud API:

To instantiate a document service adapter, use the static method Zend_Cloud_DocumentService_Factory::getAdapter(), which accepts a configuration array or a Zend_Config object. The document_adapter key should specify the concrete adapter class by classname. Adapter-specific keys may also be passed in this configuration parameter.

Example #1 Example: Using the SimpleDB adapter

  1. $adapterClass = 'Zend_Cloud_DocumentService_Adapter_SimpleDb';
  2. $documents = Zend_Cloud_DocumentService_Factory::getAdapter(array(
  3.     Zend_Cloud_DocumentService_Factory::DOCUMENT_ADAPTER_KEY    => $adapterClass,
  4.     Zend_Cloud_DocumentService_Adapter_SimpleDb::AWS_ACCESS_KEY => $amazonKey,
  5.     Zend_Cloud_DocumentService_Adapter_SimpleDb::AWS_SECRET_KEY => $amazonSecret
  6. ));

Supported Adapter Options

Zend_Cloud_DocumentService_Adapter Common Options
Option key Description Used in Required Default
document_class

Class to use to represent returned documents. The class provided must extend Zend_Cloud_DocumentService_Document to ensure compatibility with all document services. For all methods that return a document or collection of documents, this class will be used.

Constructor No Zend_Cloud_Document_Service_Document
documentset_class

Class to use to represent collections of documents, Zend_Cloud_DocumentService_DocumentSet by default. Typically, objects of this class will be returned by listDocuments() and query(). Any class provided for this configuration value must extend Zend_Cloud_DocumentService_DocumentSet.

Constructor No Zend_Cloud_DocumentService_DocumentSet
Zend_Cloud_DocumentService_Adapter_SimpleDb Options
Option key Description Used in Required Default
query_class

Class to use for creating and assembling queries for this document service; select() will create objects of this class name, as will listDocuments().

Constructor No Zend_Cloud_DocumentService_Adapter_SimpleDb_Query
aws_accesskey Your Amazon AWS access key Constructor Yes None
aws_secretkey Your Amazon AWS secret key Constructor Yes None
http_adapter HTTP adapter to use in all access operations Constructor No Zend_Http_Client_Adapter_Socket
merge

If a boolean true, all attribute values are merged. You may also specify an array of key pairs, where the key is the attribute key to merge, and the value indicates whether or not to merge; a boolean true value will merge the given key. Any attributes not specified in this array will be replaced.

updateDocument() No True
return_documents

If a boolean true, query() returns a Zend_Cloud_DocumentService_DocumentSet object containing Zend_Cloud_DocumentService_Document objects (default case); otherwise, it returns an array of arrays.

query() No True
Zend_Cloud_DocumentService_Adapter_WindowsAzure Options
Option key Description Used in Required Default
query_class

Class to use for creating and assembling queries for this document service; select() will create objects of this class name, as will listDocuments().

Constructor No Zend_Cloud_DocumentService_Adapter_WindowsAzure_Query
default_partition_key

The default partition key to use if none is specified in the document identifier. Windows Azure requires a two-fold document ID, consisting of a PartitionKey and a RowKey. The PartitionKey will typically be common across your database or a collection, while the RowKey will vary. As such, this setting allows you to specify the default PartitionKey to utilize for all documents.

If not specified, the adapter will default to using the collection name as the PartitionKey.

Constructor, setDefaultPartitionKey() Name of whatever collection the document belongs to
storage_accountname Windows Azure account name Constructor Yes None
storage_accountkey Windows Azure account key Constructor Yes None
storage_host

Windows Azure access host, default is table.core.windows.net

Constructor No table.core.windows.net
storage_proxy_host Proxy hostname Constructor No None
storage_proxy_port Proxy port Constructor No 8080
storage_proxy_credentials Proxy credentials Constructor No None
HTTP Adapter HTTP adapter to use in all access operations Constructor No None
verify_etag

Verify ETag on the target document and perform the operation only if the ETag matches the expected value

updateDocument(), replaceDocument(), deleteDocument() No False

Basic concepts

Each document-oriented service and database uses its own terminology and constructs in its API. The SimpleCloud API identifies and abstracts a number of common concepts and operations that are shared among providers.

Document storage consists of a number of collections, which are logical storage units analogous to database tables in the SQL world. Collections contain documents, which are essentially a set of key-value pairs, along with some metadata specific to the storage engine, and are identified by a unique document ID.

Each document has its own structure (set of fields) that does not necessarily have to match the structure of any other document, even in the same collection. In fact, you can change this structure after the document is created.

Documents can be retrieved by ID or by querying a collection.

Documents are represented by the class Zend_Cloud_DocumentService_Document. Note that the document class does not validate the supplied IDs and data, and does not enforce compatibility with each adapter's requirements.

The document fields can be accessed using keys as object properties and as array elements.

The basic interface of Zend_Cloud_DocumentService_Document is as follows:

/**
 * ArrayAccess allows accessing fields by array key:
 *    $doc['fieldname']
 *
 * IteratorAggregate allows iterating over all fields:
 *    foreach ($document as $field => $value) {
 *        echo "$field: $value\n";
 *    }
 *
 * Countable provides a count of all fields:
 *    count($document)
 */
class Zend_Cloud_DocumentService_Document
    implements ArrayAccess, IteratorAggregate, Countable
{
    const KEY_FIELD = '_id';

    /** 
     * $fields may be an array or an object implementing ArrayAccess. 
     * If no $id is provided, it will look for a field matching KEY_FIELD to 
     * use as the identifier.
     */
    public function __construct($fields, $id = null);

    public function setId($id);
    public function getId();
    public function getFields();
    public function getField($name);
    public function setField($name, $value);

    /**
     * These allow overloading, so you may access fields as if they were 
     * native properties of the document
     */
    public function __get($name);
    public function __set($name, $value);

    /**
     * Alternately, you can acces fields as if via native getters and
     * setters:
     *     $document->setFoo($value);    // set "Foo" field to value
     *     $value = $document->getFoo(); // get "Foo" field value
    public function __call($name, $args);
}

Note: Windows Azure Document Identifiers
Windows Azure technically requires a combination of two fields to uniquely identify documents: the PartitionKey and RowKey, and as such, keys are fully qualified by the structure array(PartitionKey, RowKey) -- which makes them non-portable. In most situations, the PartitionKey will not differ for documents in a single collection -- and potentially not even across your entire table instance. As such, the DocumentService provides several options for specifying keys:

  • Array keys will always work as expected.

  • If a string key is provided:

    • If the default_partition_key setting was provided to the constructor, or passed to the setDefaultPartitionKey() method, that value will be used for the PartitionKey.

    • Otherwise, the name of the collection on which you are operating will be used.

The takeaway is that you can utilize string keys if you wish to maximize portability of your application. Just be aware that your record will contain a few extra fields to denote the key (PartitionKey, RowKey, and the previously undiscussed Timestamp) should you ever migrate your data to another service.

Example #2 Creating a document

  1. $document = new Zend_Cloud_DocumentService_Document(array(
  2.     'key1' => 'value1',
  3.     'key2' => 123,
  4.     'key3' => 'thirdvalue',
  5. ), "DocumentId");
  6. $document->otherkey = 'some more data';
  7. echo "key 1: " . $document->key1   . "\n"; // object notation
  8. echo "key 2: " . $document['key2'] . "\n"; // array notation

Example #3 Exploring the document data

  1. $document = $documents->fetchDocument("mydata", $id);
  2. echo "Document ID: " . $document->getID() . "\n";
  3. foreach ($document->getFields() as $key => $value) {
  4.     echo "Field $key is $value\n";
  5. }

Exceptions

If some error occurs in the document service, Zend_Cloud_DocumentService_Exception is thrown. If the exception was caused by the underlying service driver, you can use the getClientException() method to retrieve the original exception.

Since different cloud providers implement different sets of services, some drivers do not implement certain features. In this case, the Zend_Cloud_OperationNotAvailableException exception is thrown.

Creating a collection

A new collection is created using createCollection().

Example #4 Creating collection

  1. $documents->createCollection("mydata");

If you call createCollection() with a collection name that already exists, the service will do nothing and leave the existing collection untouched.

Deleting a collection

A collection is deleted by calling deleteCollection().

Example #5 Deleting a collection

  1. $documents->deleteCollection("mydata");

Deleting a collection automatically deletes all documents contained in that collection.

Note: Deleting a collection can take significant time for some services. You cannot re-create a collection with the same name until the collection and all its documents have been completely removed.

Deleting a non-existent collection will have no effect.

Listing available collections

A list of existing collections is returned by listCollections(). This method returns an array of all the names of collections belonging to the account you specified when you created the adapter.

Example #6 List collections

  1. $list = $documents->listCollections();
  2. foreach ($list as $collection) {
  3.     echo "My collection: $collection\n";
  4. }

Inserting a document

To insert a document, you need to provide a Zend_Cloud_DocumentService_Document object or associative array of data, as well as the collection in which you are inserting it.

Many providers require that you provide a document ID with your document. If using a Zend_Cloud_DocumentService_Document, you can specify this by passing the identifier to the constructor when you instantiate the object. If using an associative array, the key name will be adapter-specific locations; for example, on Azure, the ID is made up of the PartitionKey and RowKey; on Amazon SimpleDB, the ID is the ItemName; you may also specify the key in the _id field to be more portable.

As such, the easiest and most compatible way to specify the key is to use a Document object.

Example #7 Inserting a document

  1. // Instantiating and creating the document
  2. $document = new Zend_Cloud_DocumentService_Document(array(
  3.     'key1' => 'value1',
  4.     'key2' => 123,
  5.     'key3' => 'thirdvalue',
  6. ), "DocumentID");
  7.  
  8. // inserting into the "mydata" collection
  9. $documents->insertDocument("mydata", $document);

Replacing a document

Replacing a document means removing all document data associated with a particular document key and substituting it with a new set of data. Unlike updating, this operation does not merge old and new data but replaces the document as a whole. The replace operation, like insertDocument(), accepts a Zend_Cloud_DocumentService_Document document or an array of key-value pairs that specify names and values of the new fields, and the collection in which the document exists.

Note: Document ID is required
To replace the document, the document ID is required. Just like inserting a document, if you use an associative array to describe the document, you will need to provide a provider-specific key indicating the document ID. As such, the most compatible way to replace a document across providers is to utilize a Document object, as shown in the examples.

Example #8 Replacing a document

  1. $document = new Zend_Cloud_DocumentService_Document(array(
  2.     'key1' => 'value1',
  3.     'key2' => 123,
  4.     'key3' => 'thirdvalue',
  5. ), "DocumentID");
  6.  
  7. // Update the document as found in the "mydata" collection
  8. $documents->replaceDocument("mydata", $document);

You may also use an existing Document object, re-assign the fields and/or assign new fields, and pass it to the replaceDocument() method:

  1. $docment->key4 = '4th value';
  2.  
  3. // Update the document as found in the "mydata" collection
  4. $documents->replaceDocument("mydata", $document);

Updating a document

Updating a document changes the key/value pairs in an existing document. This operation does not share the replace semantics; the values of the keys that are not specified in the data set will not be changed. You must provide both a document key and data, either via a Zend_Cloud_DocumentService_Document document or an array, to this method. If the key is null and a document object is provided, the document key is used.

Example #9 Updating a document

  1. // update one field
  2. $documents->updateDocument("mydata", "DocumentID", array("key2" => "new value"));
  3.  
  4. // or with document; this could be a document already retrieved from the service
  5. $document = new Zend_Cloud_DocumentService_Document(array(
  6.     'key1' => 'value1',
  7.     'key2' => 123,
  8.     'key3' => 'thirdvalue',
  9. ), "DocumentID");
  10. $documents->updateDocument("mydata", null, $document);
  11.  
  12. // copy document to another ID
  13. $documents->updateDocument("mydata", "AnotherDocumentID", $document);

Amazon SimpleDB supports multi-value fields, so data updates will be merged with the old key value instead of replacing them. Option merge should contain an array of field names to be merged. The array should be key/value pairs, with the key corresponding to the field key, and the value a boolean value indicating merge status (boolean true would merge; false would not). Any keys not specified in the merge option will be replaced instead of merged.

Example #10 Merging document fields

  1. // key2 is overwritten, key3 is merged
  2. $documents->updateDocument('mydata', 'DocumentID',
  3.     array('key2' => 'new value', 'key3' => 'additional value'),
  4.     array('merge' => array('key3' => true))
  5. );

Deleting a document

A document can be deleted by passing its key to deleteDocument(). Deleting a non-existant document has no effect.

Example #11 Deleting a document

  1. $documents->deleteDocument("collectionName", "DocumentID");

Fetching a document

You can fetch a specific document by specifying its key. fetchDocument() returns one instance of Zend_Cloud_DocumentService_Document.

Example #12 Fetching a document

  1. $document = $service->fetchDocument('collectionName', 'DocumentID');
  2. echo "Document ID: " . var_export($document->getID(), 1) . "\n";
  3. foreach ($document->getFields() as $key => $value) {
  4.     echo "Field $key is $value\n";
  5. }

Querying a collection

To find documents in the collection that meet some criteria, use the query()method. This method accepts either a string which is an adapter-dependent query and is passed as-is to the concrete adapter, or a structured query object instance of Zend_Cloud_DocumentService_Query. The return is a Zend_Cloud_DocumentService_DocumentSet, containing instances of Zend_Cloud_DocumentService_Document that satisfy the query. The DocumentSet object is iterable and countable.

Example #13 Querying a collection using a string query

  1. $docs = $documents->query(
  2.     "collectionName",
  3.     "RowKey eq 'rowkey2' or RowKey eq 'rowkey2'"
  4. );
  5.  
  6. foreach ($docs as $doc) {
  7.     $id = $doc->getId();
  8.     echo "Found document with ID: "
  9.         . var_export($id, 1)
  10.         . "\n";
  11. }

If using a structured query object, typically, you will retrieve it using the select() method. This ensures that the query object is specific to your adapter, which will ensure that it is assembled into a syntax your adapter understands.

Example #14 Querying a collection with structured query

  1. $query = $service->select();
  2. $query->from('collectionName')
  3.       ->where('year > ?', array(1945))
  4.       ->limit(3);
  5. $docs = $documents->query('collectionName', $query);
  6.  
  7. foreach ($docs as $doc) {
  8.     $id = $doc->getId();
  9.     echo "Found document with ID: "
  10.         . var_export($id, 1)
  11.         . "\n";
  12. }

Zend_Cloud_DocumentService_Query classes do not limit which query clauses can be used, but the clause must be supported by the underlying concrete adapter. Currently supported clauses include:

  • select() - defines which fields are returned in the result.

    Note: Windows Azure ignores this clause's argument and always returns the whole document.

  • from() - defines the collection name used in the query.

  • where() - defines the conditions of the query. It accepts three parameters: condition, array of arguments to replace "?" fields in the condition, and a conjunction argument which should be "and" or "or", and which will be used to join this condition with previous conditions. Multiple where() clasues may be specified.

  • whereId() - defines the condition by document ID (key). The document matching must have the same key. The method accepts one argument - the required ID (key).

  • limit() - limits the returned data to specified number of documents.

  • order() - sorts the returned data by specified field. Accepts two arguments - first is the field name and second is 'asc' or 'desc' specifying the sort direction.

    Note: This clause is not currently supported by Windows Azure.

Creating a query

For the user's convenience, the select() method instantiates a query object specific to the adapter, and sets the SELECT clause for it.

Example #15 Creating a structured query

  1. $query = $documents->select()
  2.                    ->from('collectionName')
  3.                    ->where('year > ?', array(1945))
  4.                    ->limit(3);
  5. $docs = $documents->query('collectionName', $query);
  6. foreach ($docs as $doc) {
  7.     $id = $doc->getId();
  8.     echo "Found document with ID: "
  9.         . var_export($id, 1)
  10.         . "\n";
  11. }

Accessing concrete adapters

Sometimes it is necessary to retrieve the concrete adapter for the service that the Document API is working with. This can be achieved by using the getAdapter() method.

Note: Accessing the underlying adapter breaks portability among services, so it should be reserved for exceptional circumstances only.

Example #16 Using concrete adapters

  1. // Since SimpleCloud Document API doesn't support batch upload, use concrete adapter
  2. $amazonSdb = $documents->getAdapter();
  3. $amazonSdb->batchPutAttributes($items, 'collectionName');
blog comments powered by Disqus