Added by Jeremy Giberson, last edited by Jeremy Giberson on Jun 19, 2009  (view change)

Labels

 

Zend Framework: Zend_Olap Component Proposal

Proposed Component Name Zend_Olap
Developer Notes http://framework.zend.com/wiki/display/ZFDEV/Zend_Olap
Proposers Jeremy Giberson
Zend Liaison TBD
Revision 1.0 - 16 June 2009: Initial Draft. (wiki revision: 21)

Table of Contents

1. Overview

This Zend_Olap proposal is a bit of a water tester. I'm hoping it may pique the curiosity of some one familiar with OLAP to either help this proposal as a Q&A resource or even take over. Frankly, I'm new to OLAP and have neither experience nor practical use with the technology. However, after having read about it and vaguely understanding what it can achieve, I became interested in providing similar functionality via Zend Framework. I'm building this component as I go along, backwards engineering it from random resources.

Zend_Olap is meant to provide a OLAP solution by leveraging Zends already existing Zend_Db components as a bridge layer to already existing RDBMS tables as OLAP fact tables. A schema file is used to map the RDMS tables to cubes, dimensions, and measures. Zend_Olap will act as an MDX processing engine working on your RDMBS data in memory generating multi-dimensional result sets to be used in reporting.

2. References

Zend_Olap is being reverse engineered from information I found on various websites. The primary source of influence has been from the Pentaho Mondrian project at mondrian.pentaho.org.
Specifically, the documentation pages for architecture and schema were used to develop the component classes and theory of operation.

Also, for testing my OLAP implementation, I've been using the sample database from Mondrian. Running queries against the Mondrian OLAP and against my OLAP implementation to insure the result sets are the same. The actual database is located in my MySQL database. I can provide the sql files I used to populate the database since the sample that comes with Mondrian lacks create statements and could never get the java creation tool to work properly.

Additionally, I've been using Microsoft's MSDN for mining information about MDX statements and language reference.

3. Component Requirements, Constraints, and Acceptance Criteria

Functioning as an OLAP engine, the Zend_Olap component is going to be very technical and have several complex parts. Generally speaking, Zend_Olap will be required to:

  • Poll existing RDBMS for the data that will be used as the factual data for cubes.
  • Map cubes, dimensions, measures and aggregates to the RDBMS layer via schema file.
  • Utilize Zend_Cache component for performance improvements.
  • Accept, parse and execute MDX expressions to generate multi-dimensional result sets.
  • Most importantly, the Zend_Olap component will be required to work. This is probably the hardest requirement

4. Dependencies on Other Framework Components

  • Zend_Acl
  • Zend_Acl
  • Zend_Cache
  • Zend_Db
  • Zend_Exception

5. Theory of Operation

As already mentioned, the Zend_Olap component will utilize your existing RDBMS for its factual data. Using a configuration file (xml format) you will describe the overall schema for the OLAP engine. Cubes, dimension, aggregates, measures and how they map to the underlying RDBMS tables are dictated in the schema file. Once the Zend_Olap engine has been instantiated and configured it will be ready to query via MDX Expressions and will return multidimensional result sets (php multidimensional arrays) that user can then use for reporting. Perhaps some one may be interested in generating a Zend_Olap_Report component in unison with this component to handle the process of displaying and navigating large result sets.

Where possible, the Zend_Olap component will cache operation results to speed up subsequent queries.

Sample usage for the component is displayed below. It should be pointed out that the hardest part of using Zend_Olap will be writing the schema file that describes your RDBMS in OLAP cubes. After that, using and querying the OLAP engine are (code wise) a piece of cake.

Usage Steps

  • Instantiate Zend_Olap
  • Load XML schema file
  • Supply db adapter
  • Query with MdxExpression
Zend_Olap : Olap Engine Core

Pattern Discovery
To create an effective OLAP implementation we first need to identify the processes in which OLAP data models can be used in relation to each other to generate meaningful multi-dimensional data result sets.

It is then necessary to translate those processes into a hybrid of relational database queries and PHP processing that replicates the above discovered processes.

The PHP/RDBMS implementation of a single distinguishable operation will be referred to as a pattern.

The following initial patterns have been discovered and translated during the basic prototyping phase.

  • Selecting <measure> on columns from <cube>
  • Selecting <dimension>.<level> on columns from <cube>
    Selecting <dimension>.<level> on columns, <measure> on rows from <cube>
  • Selecting <dimension>.<level> on columns, <dimension>.<level> on rows from <cube>

Worth mentioning is that the words columns and rows are labels for "AXIS" 0 and 1 respectively. MDX supports up to 128 axis being specified in an MDX query.

It is likely that during further development the initial patterns will be replaced with more granular components such as:

  • Selection Components
  • <measure> on <axis>
  • <dimension>[levels] on <axis>
  • From Components
  • <cube>
  • <cube expression>
  • <virtual cube>
  • Where Component
  • <property> <comparison> <value>
  • Query Pattern
  • <selection component> <from component> <where component>

The goal of the pattern development is to implement an engine that can utilize the native database to select and manipulate the data with as little PHP intervention as possible except where ever it is more optimal to do so.

The patterns will become more clear when we have decided what is and is not reasonable to request the OLAP engine to do.

Zend_Olap_Mdx | Zend_Olap_MdxExpression : MDX Query Execution Core

Query Parsing (Zend_Olap_MdxExpression)
The first phase of querying the Zend_Olap engine will be dissecting the MDX query language into a set of operations the engine can execute. IE, converting a query from a string to a collection of engine patterns (discussed above).

While simple MDX queries have a simple syntax that can be parsed and interpreted, MDX supports a rich feature set of functional operations that can be performed in MDX queries to simplify working with sets and calculated members and other complicated cell selections.

The MDX query parser should be created as a two part system. The first part, a tokenizer that can parse any valid MDX expression into token entities. And a second part--that takes processed token entities and bundles them into patterns that can be processed by the Zend_Olap engine.

In this manner, the tokenizer can be created initially and never have to be updated as the core feature set of Zend_Olap engine grows. Instead, the patternizer will only have to be updated. If an abstract method can be created such that the patternizer is feature independent that would further reduce the need for upkeep on the MDX parsing component as Zend_Olap engine becomes more robust.

Query Executing (Zend_Olap_Mdx)
The Zend_Olap_Mdx component will be responsible for retrieving patterns from the Zend_Olap_MdxExpression object and passing those patterns to the Zend_Olap_Engine for processing. The execution of the query method will result in a multi-dimensional data set that is returned to the user.

The Zend_Olap_Mdx is given responsibility of being the middle man between Zend_Olap engine and Zend_Olap_MdxExpression queries in the hopes that Zend_Olap_MdxExpression can be coded independent of Zend_Olap engine feature set.

Zend_Olap_Mdx will be responsible for checking that Zend_Olap_MdxExpression doesn't contain any abstract patterns that the Engine doesn't know how to process yet. As an example, as of the current state of implementation the Zend_Olap engine doesn't support where conditionals in the query. Zend_Olap_Mdx only asks Zend_Olap_MdxExpression for columns, rows and from component and then does processing--ignoring any extra information Zend_Olap_MdxExpression may contain.

Additional Note:
As it stands, Zend_Olap_Mdx does all of the execution logic to process an OLAP pattern. Ideally, processing patterns will be moved to a new class like Zend_Olap_Engine.

Then Zend_Olap_Mdx can be utilized to simply be responsible for passing patterns from Zend_Olap_MdxExpression to and from Zend_Olap_Engine as discussed above.

Zend_Olap_SchemaParser : Mapping RDBMS to OLAP Models & More

The schema file can be utilized for more than simply mapping RDBMS tables to OLAP models, but we can also use it for configuring Zend_Olap_Engine properties and security ACL implementations.

OLAP Models
The first priority of the schema file is of course establishing OLAP models like Cubes, Dimensions, Measures, Calculated members et al by mapping these constructs to the underlying RDBMS tables.

While a format has not officially been decided, at present I have been utilizing the schema structure of the Pentaho Mondrian project listed in the references section. As the Zend_Olap component is further developed it will become necessary to deviate from that projects format to fit this components implementation.

OLAP Security
The schema file can also be utilized to implement security access lists. By utilizing the already existing Zend_Acl component, we can use the schema file to define roles and resources with in the OLAP models. Then, either the Zend_Olap can be passed a user role to check against the ACL for all queries, or the MDX Expression object can be passed a role on a per query basis for query authorization.

It will be the responsibility of the engine to prior to executing a query--determine all OLAP models that will be utilized and check the associated role for the query has permission to query that resource. If any resource is not granted access to the associated query role the query should through a security exception.

Zend_Olap_Engine Configuration Options
The schema file can utilize either full XML tags to specify engine options, or engine options may be specified by the tag attributes.

For example, part of the cache configuration options will be specified via tag attribute for dimensions, measures, cubes and calculated members. The attribute will be "cacheTag" and the value could be a comma separated list of tags. (cacheTag will be explained in the cache section).

Zend_Cache : A role to play in optimization

While it is not clear yet the full extent that cache will be utilized to optimize the Zend_Olap component, there are some obvious possibilities that can be initially supported.

Since the Engine will be operating on patterns, we can save the calculation results of patterns that have been processed into a cache object identified by a unique hash identifier of the pattern.

We can also apply tags to the cache result. The cache tags will allow us to locate cache objects and expire them and force a recalculation.

Example
If we have a cube measure like store sales and the value is only updated once per day at night after store has closed, any query against that measure when repeated will always have the same value, but only for 24 hours, until the next days sales have been added to the table.

It makes sense to cache a query against that measure for 24 hours, or until the underlying table has been updated. To do this, lets add a tag to the measure in the schema file that will be applied to the cache object.

Later, we can delete all cache objects containing this tag nightly. This operation would probably be performed by a script that you have set to run nightly after your end of day reports have completed.

To add further to the example, lets also imagine we have a dimension in this cube for employees. This dimension for instance, when applied against the store sales measure, tells us how much of store sales each employee is responsible for.

Ignoring the measure component for a moment, a query against this dimension will only change when an employee is added. So, we could cache these queries for an indefinite amount of time until a new employee is added to the company/store.
To do that, lets add the tag 'newemployee'.

At a later date when a new employee has been added we can find all cache objects with the tag newemployee and expire it so it has to be recalculated.

The benefit here then, is that when the Zend_Olap engine runs a query against any component with a cache tag, the result is cached and tagged with all the tags found in each component.

In our example, the MDX Query "SELECT [Employees].members on columns, [Measures].[Store Sales] on rows from Sales" would be cached and tagged with nightly,newemployee. The result would be reused until the cache object expired.
The object would be expired by exceeding its expiration time naturally, or manually when we tell the cache to expire objects with either (or both) tags nightly/employee.

6. Milestones / Tasks

Presently the Zend_Olap component is in a state of minimal prototype functionality. Zend_Olap can handle loading the schema configuration and processing <b>simple</b> statements. As the component develops in to a more mature state the structure and existence of internal classes are likely to evolve.
Zend_Olap component comprises the following core goals:

  • Map RDBMS to OLAP Entities - XML Schema Parser
  • Execute MDX Queries - Mdx Parser <-> OLAP Engine
  • Optimize OLAP Operations & Results - Cache, Query Strategy

The following are natural milestones for this component:

  • Conceptualize OLAP implementation strategy - done
  • Prototype minimal implementation to test validity of strategy - done
  • Propose to ZF community and illicit feedback regarding direction and implementation of component
  • Concrete proposal based on user feedback and suggestions
  • Expand on prototype until component arrives as first stage of completeness to be accepted into the incubator
  • Unit Tests

7. Class Index

Bold classes are classes the programmer will typically utilize. The other classes are typically only utilized internally by the engine.

  • Zend_Olap
  • Zend_Olap_Cube
  • Zend_Olap_Dimension
  • Zend_Olap_Hierarchy
  • Zend_Olap_Level
  • Zend_Olap_Mdx
  • Zend_Olap_MdxExpression
  • Zend_Olap_Measure
  • Zend_Olap_Member
  • Zend_Olap_Property
  • Zend_Olap_Schema
  • Zend_Olap_SchemaParser
  • Zend_Olap_Set
  • Zend_Olap_Table
  • Zend_Olap_Tuple

8. Use Cases

Setting up the Zend_Olap component

Query Result

Result is a multi-dimensional array

9. Class Skeletons

Please see the attached prototype source files for class skeletons. And by skeletons I mean not skeletons, but a working prototype.
Prototype Source

I've also attached the foodmart data source in zipped sql statements.
Data Source

we'd love to have an OLAP option for the current project I'm working on, minimally it would be great to just be able to connect to through to an existing OLAP engine, like Modrian or even Analysis services.

A cool feature would be to have a basic scaffolding for simple reporting to pass a value from an individual dimension in as a route, for instance "123" would be a dimension member of the partners dimension:

/default/partners/display/123

then you would get back dimensions supplied say, months and country

/default/partners/display/123/month/country

I wonder about the performance though using a ROLAP implementation. Would there really be a performance gain? Would the goal just be to simplify querying for aggregations?

Bottom First:
Actually, the ROLAP design is not performance based--in fact ROLAP is generally considered to be the slower methodology over MOLAP. The reason why the ROLAP design is in place is to utilize the existing relational database tools in the Zend framework. To use a MOLAP design we would need:

  • A real olap engine w/ multidimensional database
  • A way to connect to the engine/database.

The purpose of this proposal is to actually provide OLAP engine functionality on top of your existing RDBMS.

Top:
That being said, it sounds like what you need is an OLAP adapter. Something that can connect to an OLAP engine service, send responses and return the result back to the caller. However, most of the OLAP engines I've researched don't offer any kind of standardized web service APIs. Most are simple engine implementations that offer an API to be used in a third party application to connect to the engine and query / parse results in a progmatic oriented way as opposed to a service oriented method (if that makes sense).

At best, if a certain OLAP engine or application offers a web service interface where you can send query descriptions and receive formatted result sets (say in xml or something) then the option would be to create a Zend_Service_OlapWebService component specifically for that application.

Middle:
Such a route driven process would certainly be possible with the zend_olap component but only at the application level. You're talking about combining zend components route/controller/olap to create application level logic--and really that falls outside the scope of this component proposal.
The scope of this component is simply creating the PHP/RDBMS powered OLAP engine capable of defining OLAP models from relational data and executing queries and returning results. Zend_Olap is only the same level of Zend_Db, and as such doesn't really make sense to create a scaffolding as an extension of this proposal.

However, it may be appropriate to propose in the future Zend_Scaffolding_Olap. Which is a component of the many scaffoldings that might have been created for zend components and which doesn't necessarily utilize Zend_Olap component, but rather Zend_Olap_Adapter which (going back to the top) interfaces with an existing OLAP engine service (including say a service powered by Zend's zend_olap component) to provide a zend powered reporting service application.