Skip to end of metadata
Go to start of metadata

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

Zend Framework: Zend_Cache_Backend_Static Component Proposal

Proposed Component Name Zend_Cache_Backend_Static
Developer Notes http://framework.zend.com/wiki/display/ZFDEV/Zend_Cache_Backend_Static
Proposers Pádraic Brady
Revision 1.0 - 25 January 2009/28 July 2009 (wiki revision: 8)

Table of Contents

1. Overview

Zend_Cache_Backend_Static aims to provide a backend for caching full pages as static files within the public directory, or any public subdirectory. The advantage of this caching strategy is that it allows the HTTP server to directly serve static HTML (and other) files without involving PHP allowing for impressive increases in throughput and responsiveness compared to dynamic pages requiring PHP. The caching mechanism will support tagging in full ensuring that expiry and invalidation management is simplified.

This proposal obviously has one specific deployment target, that of caching on a single server, VPS or shared hosting account where the additional performance is warranted and scaling to multiple servers is a non-issue. That said, there are use cases where static files may be cached to memcache for retrieval by a memcache aware web server (such as a typical frontend nginx setup using Apache as a backend server for PHP work).

To avoid any confusion over the operation of this static cache, it does repeat the functionality of the existing Zend_Cache_Frontend_Page to some extent. The difference is that while Zend_Cache_Frontend_Page requires the invocation of PHP to access the cache, this static backend does not. All static files are stored to either the filesystem or to memory for direct access by web servers without requiring a PHP invocation.

2. References

Source Code (In Development)

3. Component Requirements, Constraints, and Acceptance Criteria

  • MUST implement a fully funtional and configurable static file cache
  • MUST provide sufficient features to be easily adapted at a high level by a Cache Manager (see related proposal)
  • SHOULD offer easy access to page level caching from Controllers via an Action Helper (see class skeletons)

The original third requirement has been deleted after discussion with the Zend_Cache lead developer, and taken forward as a potential improvement point for Zend Framework 2.0. It related to the focus on making Frontends responsible for the validation of IDs used by Backends. Since a Backend cannot define what comprises a valid ID, any Backend using a non-typical ID system must employ a set of workarounds such as hashing and base64 conversions if possible. Reversing this position is quite simple, however it risks a backwards compatibility break for any custom Backend classes written by users. Since any static cache will inevitably be tied to the URL it was generated for, such workarounds must be employed by this proposal.

4. Dependencies on Other Framework Components

Zend Framework Classes

  • Zend_Cache_Core
  • Zend_Exception
  • Zend_Cache_Backend_Interface

Additionally Proposed Classes

  • Zend_Cache_Frontend_Capture

5. Theory of Operation

The Zend Framework currently offers a frontend called Zend_Cache_Frontend_Page which caches the output from an entire document into memory or file based caches. This cache is then tested for the next request, and if a hit is detected, the page is loaded from the cache and served. However, while it is many times faster than dynamically generating pages (by a factor of 9-10 on my VPS), it suffers from one problem - it still needs Zend_Cache and PHP. This limits the speed of even the best cached page using Zend_Cache_Frontend_Page since Apache and PHP remains one of the slowest ways of delivering responses.

The fastest way of delivering pages is to save them as ordinary HTML files any HTTP server can serve. The only limit then is your HTTP server's maximum throughput for any given user concurrency level. Apache generally has a consistent performance when optimised, but some popular lightweight HTTP servers like nginx or lighttpd can outpace Apache quite easily in many circumstances, making static files a valuable option when optimising applications on minimal and non-scaled hardware. Web servers such as nginx may also directly access memcached which means entire static files can be stored to memory using a URL based ID for retrieval.

A Static File Cache is generated at the exact same level as Zend_Cache_Frontend_Page. The difference is that it caches pages into files within a public directory of the application with a valid file extension (e.g. .html) and content. On any given request, the HTTP server can skip PHP, and serve this file directly.

This is a lot faster than the current style of Page caching. A simple benchmark on my VPS using a single PHP echo statment vs a static HTML file showed a throughput increase of around 600% (711.20 vs 4,208.52 requests/sec). It also eliminates to an extent the need for using a caching proxy like Squid which has traditionally performed a similar role for applications without static file caching implemented natively, and enhances the benefit of using a low memory/CPU reverse proxy HTTP server (like nginx or lighttpd) to serve static content instead of Apache. This is particularly true on small single servers where Squid is either overkill or simply not available, and even more powerful servers where the reverse proxy HTTP server can overtake Apache with ease.

In the proposed caching system, there will also be a tagging mechanism supported using a backend cache (to maintain the tag data - a cache within a cache) which would make invalidating such statically cached files a lot easier to perform than relying on manual expire instructions (which are near impossible to track in large applications). I would hope to forward a Zend_Cache_Backend_Db proposal for a database backed cache as a separate proposal since a tagging system is more efficiently stored in a more atomic form than a file cache.

An example:

Assuming Static Caching is enabled for the URI:

/news/tags/zend-framework/2

This would be cached to file as:

/news/tags/zend-framework/2.html

To keep the public directory free of confusion, 2.html would actually saved to /public/static/2.html (a default convention). Since this obviously does not match the preserved route, the incoming REQUEST_URI is rewritten onto this file location using the following Rewrite Rule additions:

RewriteEngine On

RewriteRule ^/(.*)/$ /$1 [QSA]
RewriteRule ^$ static/index.html [QSA]
RewriteRule ^([^.]+)/$ static/$1.html [QSA]
RewriteRule ^([^.]+)$ static/$1.html [QSA]

RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d 

RewriteRule ^.*$ - [NC,L]
RewriteRule ^.*$ /index.php [NC,L]

A few of these rules handle cases where the URI has a trailing forward-slash, or where the request is for the root URI (i.e. the index file should be served). The rest are the usual recommended rules for Zend Framework applications preceded by rules to rewrite requests initially to the static cache.

Unfortunately, offering this backend requires either changes to Zend_Cache (backend parameter validation is performed by private static methods in Zend_Cache_Core which defeat any attempt at overloading) or workarounds to bypass the validation altogether. This is necessary because the static file cache's ID is its path in the filesystem and not an alphanumeric key. Also the backend class itself could offer additional methods which are otherwise blocked. As noted earlier - these changes will be proposed for Zend Framework 2.0 in the future.

The static cache also may require a new Frontend which can capture output without relying on a pre-set ID (see Zend_Cache_Frontend_Output), as the ID is assigned dynamically based on the value of $_SERVER['REQUEST_URI'] but this is a relatively minor change. I've elected for the purposes of this proposal to implement this as an entirely new class however there may yet be a means of merging this with the existing Frontend for capturing output. Both methods of capturing output are very similar - but operate at slightly different levels which may make merging difficult or messy.

It is anticipated that static file caching will be of use in small to medium applications on limited hardware in a minimally scaled system. Obviously, the static cache method is not very scalable since it's normally bound to a filesystem. However using memcached is an alternative depending on whether web servers are memcache aware.

6. Milestones / Tasks

  • Milestone 1: [DONE] Complete prototype/minimal iteration version
  • Milestone 2: Complete final feature list incorporating feedback
  • Milestone 3: Verify operation using Unit Tests
  • Milestone 4: Documentation

7. Class Index

  • Zend_Cache_Frontend_Capture
  • Zend_Cache_Backend_Static

8. Use Cases

Note: All use cases have the same problem - the cache ID is actually the REQUEST URI of the current request and this can easily contain characters forbidden by Zend_Cache_Core's private static validation methods. I had omitted any workaround on the assumption this restriction can be lifted (which it won't - but a workaround is in development so it won't matter).

UC-01: Starting/Ending Cache
UC-02: Deleting Caches based on $_SERVER['REQUEST_URI']
UC-03: Assigning Tags
UC-04: Deleting static files by tag

9. Class Skeletons

Output capturing frontend:

With Zend_Cache_Manager, implementation of an Action Helper to integrate page level caching.

]]></ac:plain-text-body></ac:macro>

]]></ac:plain-text-body></ac:macro>

Labels:
proposals proposals Delete
cache cache Delete
backend backend Delete
zend_cache zend_cache Delete
proposal proposal Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Jun 03, 2009

    <p>I do not think this is a good idea. The reason that someone would cache pages as static files is that the content does not change very often at all. </p>

    <p>1. For starters it would require a second backend to support tags (such as sqlite). Dealing with sqlite and files together would be quite slow.</p>

    <p>2. If items are static, Caching items in a static cache would make more sense if they were cached for a longer period of time, such as a day or week. The problem with this is that the cache of static files can grow very large and take a very long time to delete. Even deleting by tags can be slowed down a lot. Many people who need this are on share hosts and have time and memory limits imposed on PHP scripts.</p>

    <p>2. Another problem is that it currently only supports html content. Quite a complex set of Rewrite Rules would be required to support more than html files. (This would not be needed if the files were there permanently)</p>

    <p>Now I do think that this is emphasizing a point where ZF could be improved. Here is a better solution IMHO:</p>

    <p>Storing a page:<br />
    1. Generate a series of static html pages upon page request like this static cache would do.<br />
    2. Do not cache by time/expiry.<br />
    3. When page output is stored as a static html file, a second backend (sqlite?) should NOT store the content, BUT only store the tags, the url called to generate the cache and the location of the static html file that holds the content.</p>

    <p>Clearing the cache:<br />
    1. As this is static content, we should not need to clear the cache. Instead, when the cache clear method is called, any matching record should be marked as "requiring update". (Do this with a "updateRequested" column that holds a date so we can process in reverse date order)<br />
    2. A cron job could then run a process that internally calls (dispatches with ZF bootstrap?) each url requiring update. When updating, the file will only be removed if the url returns a non-200 status code. Else the file will just be updated.<br />
    3. If a cron job is not available, this could be done randomly at the end of other ZF page calls. </p>

    <p>This could then handle pages that take a very long time to generate.</p>

    1. Jul 28, 2009

      <p>Actually, your first sentence just proved why the component is needed <ac:emoticon ac:name="wink" />. It's intended to cache rarely changing HTML content, which is the whole idea of having a cache. It's true it needs work to include other static file types like XML or even JSON, but these are largely down to adding different file extensions (probably that oversimplifies it I know). I'll look into such functionality if the core proposal is acceptable.</p>

      <p>The use of a backend cache is absolutely necessary for a tag system. This will impose a performance penalty when caching, or deleting a cache, or worse searching by tag for what to delete but this should be a relatively rare event compared to the number of requests hitting the static files themselves. It's a tradeoff I can't impose through a component - it's up to users to define whether content can or even should be statically cached and for how long.</p>

      <p>The limits on shared hosting should be sufficient but obviously if resources are that scarce, then some additional management may be needed. For example, all static content will go into a "static" directory cleanly separated from all other files. How long does it take to delete an entire directory for a total clean? There are ways around these issues but they don't belong per se in the component. We can offer a supportive architecture, but best and suitable practice outside that architecture is the domain of users - doesn't mean we can't make it easier though! <ac:emoticon ac:name="smile" /></p>

      <p>Yep, caching for extended periods in larger applications (or rather applications with a lot of URI endpoints) will result in a large cache depending on the level of static caching employed. Unfortunately, there's not a lot we can do to prevent that - we don't get to choose what users will cache or not.</p>

      <p>Now to your suggestions (all quite good!).</p>

      <p>Storing a page:<br />
      1. No change from the proposal<br />
      2. I left this open ended. You can cache forever if wanted by dropping an expiry time.<br />
      3. I forget exactly what I record with tags (maybe not an exact location but the URL) but we don't also include the content since the duplication is not necessary.</p>

      <p>Clearing the cache:<br />
      1. There's a subtle difference using static cache and static content. Static content doesn't change and isn't sensitive very much to dynamic data. This IS where an updateNeeded or other flag would be useful - so I'll take it on board. In other cases, updates should happen on the fly. As an example, imagine a blog feed which is statically cached to XML. When a new entry is added from the admin module, we can at that time delete and replace that static cache. It wouldn't be useful just flagging it for an update because our feed needs to reflect change immediately.<br />
      2. No argument for a use case where an update flag is used.<br />
      3. I'd leave this to the user - not something the proposal can address.</p>

      <p>Overall, I like the update flag idea. I'll look into its implementation and add it as a feature.</p>

  2. Jul 27, 2009

    <p>I had to write recently a cache component like this one, and that's a shame that i didn't know about yours...spent two days writing it.<br />
    I hope very much that Zend will include it in trunk, it's incredibly usefull especially as ZF is a heavy burden for any VPS or even for 80$ server with 3-4k visitors.</p>

    1. Jul 28, 2009

      <p>That's precisely the use case it was designed for <ac:emoticon ac:name="wink" />. I've just moved this forward so Zend can review it, offer recommendations, and hopefully (fingers crossed) promote it to the Incubator for future inclusion. There's sufficient code in place that with a few final tweaks and a brushup on code quality it would be ready for ZF 1.10.</p>

  3. Jul 28, 2009

    <ac:macro ac:name="note"><ac:parameter ac:name="title">Zend Framework Acceptance</ac:parameter><ac:rich-text-body>
    <p>The Zend Framework Team is pleased to accept this proposal for immediate development in the Standard Incubator.</p>

    <p>We ask that you follow the requirements below when developing the component:</p>
    <ul>
    <li>Utilize the TwoLevels cache for tag implementation
    <ul>
    <li>This will provide a single solution for all backends that need to do tagging, but for which the given implementation has no such capabilities. Document then how this would be done (revise UC-03 in this case).</li>
    </ul>
    </li>
    <li>Is it possible to change the file_extension on the fly?
    <ul>
    <li>As noted in comments and by yourself, this would be useful for caching XML and JSON responses as well. Often, the response type is not known until the view renders, but can be guessed based on the Request envelope. Having the ability to change the file_extension as needed is crucial.</li>
    </ul>
    </li>
    </ul>
    </ac:rich-text-body></ac:macro>

  4. Feb 22, 2011

    <p>Zend_Cache_Backend_Static is released into the stable branch. Maybe we can archive this proposal <ac:emoticon ac:name="smile" /></p>