Skip to end of metadata
Go to start of metadata

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

Zend Framework: Zend_Filter Design Component Proposal

Proposed Component Name Zend_Filter Design
Developer Notes http://framework.zend.com/wiki/display/ZFDEV/Zend_Filter Design
Proposers Darby Felton, author & Zend liaison
Revision 0.3.0 - 10 Jan 2007: Resolved problems with dual-purpose component design and class naming schema (wiki revision: 22)

Table of Contents

1. Overview

Introduction

What is a filter?

In the real world, a filter is typically used for removing unwanted portions of input, and the desired portion of the input passes through as filter output (e.g., coffee). In such scenarios, a filter is an operator that produces a subset of the input. This type of filtering is useful for web applications - removing illegal input, trimming unnecessary white space, etc.

The idea of filtering may also be extended to performing generalized transformations upon input, more than simply acting as an operator to produce a selected subset of the input. A common transformation applied in web applications is the escaping of HTML entities. For example, if a form field should have its value populated with a dynamic value, this value should either be free of HTML entities or contain only escaped HTML entities, in order to prevent undesired behavior and security vulnerabilities. To meet this requirement we can either remove HTML entities or escape them, and other approaches may more appropriate for different situations. A filter that removes the HTML entities operates within the scope of the first definition of filter - an operator that produces a subset of the input. A filter that escapes the HTML entities, however, transforms the input. If we consider such use cases within the scope of filtering, then filtering must be redefined as an operator that performs transformations upon input.

What is a validator?

A validator examines its input with respect to some requirements and produces a boolean result - whether the input successfully validates against the requirements. If the input does not meet the requirements, a validator may additionally provide information about which requirement(s) the input does not meet.

For example, a web application might require that a username be between six and twelve characters in length and may only contain alphanumeric characters. A validator can be used for ensuring that usernames meet these requirements. If a chosen username does not meet one or both of the requirements, it would be useful to know which of the requirements the username fails to meet.

A validator differs from a filter primarily in that a validator produces a boolean result, based on whether the input validates against some requirements. A filter, on the other hand, produces output that is based on transformations upon the input.

Why consider filters and validators together?

Though filters and validators perform different operations upon their input, it is easy to see how filters and validators are often used together in web applications.

Consider the use case of a blog application that accepts comments upon its entries. Let us assume that the comment form contains two fields - an e-mail address field and a multi-line comment text area.

Let us further assume that a submitted comment containing an e-mail address that does not conform to the e-mail address schema should be rejected (e.g., not@realaddress). A validator would be used to determine whether an e-mail address is acceptable.

Since an accepted comment would appear on the associated article's web page, comment authors should be unable to perform cross-site scripting (XSS). A filter would be used to remove XSS-capable content or transform it such that XSS cannot be performed.

Because the e-mail validator and comment filter would be operating upon different parts of the same form input, they should be readily available to the same parts of the web application code. Thus, filters and validators are often used together in applications.

Filters and validators may also be related further than the proximity of their use in web application code. Consider a user-entered phone number as [part of] the application input. The phone number (555) 555-1234 may be written in several ways, such as 555.555.1234 and 555-555-1234, all of which should be considered valid. Though there are other ways to solve this problem, let us assume that the application will strip all punctuation and white space from the input, normalizing the phone number to contain only numbers. Once the normalized phone number is available from the filter, the application requires that the phone number be ten digits, and a simple validator can be used for meeting this requirement. Thus, for a single datum, the input phone number, the application must use a filter and then a validator. The filter and validator for a phone number should be readily and equally available to the application, perhaps even contained in the same class.

Zend_Filter already provides filters and validators. Why revisit these topics now?

The Zend_Filter component currently provides a library of static filter and validator methods via a monolithic Zend_Filter class.

A method of Zend_Filter is a filter or validator by naming convention only, and filter methods are further divided into whitelist and blacklist filters, again only by naming convention.

The Zend_Filter component includes another class, Zend_Filter_Input, that allows instantiation with an array of input data. The methods of Zend_Filter_Input generally act as proxies to the static methods of Zend_Filter, passing along a specific element of the input array to the proxied methods.

By using Zend_Filter an application gains only the filters and validators provided by the class, nothing more. The size of Zend_Filter increases proportional to the number of filters and validators. A class having too many methods is confusing and difficult to use, and the current Zend_Filter class cannot mitigate this risk except through potentially complicated naming conventions, which still does not reduce the number of methods in the class.

The only way for users to add functionality to Zend_Filter is by extending the Zend_Filter and Zend_Filter_Input classes. By doing so, a user class inherits all the filters and validators of each inherited class, even if only one or two methods are needed. This approach also does not support clean integration of filters and validators from third-party providers.

Complex filters and validators that require supporting class constants, instance variables, and/or helper methods are all contained in a single class. As the number of filters and validators grows, the complexity of Zend_Filter and its naming conventions grows proportionally.

Zend_Filter supports ordering of filters and validators only by the order in which the methods are called from user code. Though this may be enough, it may also be worthwhile to consider providing additional support for managing filters and validators. For example, if an ordered set of filters and/or validators are to be applied to several input data, it may be useful to have a container for the filters and/or validators that can operate on multiple input data and provide aggregated results to the user. Zend_Filter_Input provides some degree of support for this idea, but its limitations may hinder users.

In order to solve these limitations of the current Zend_Filter design, we revisit these topics in order to improve the component's design with respect to two key framework goals - extremely simple (to use) and use-at-will architecture (flexibility and extensibility).

Summary

Filters and validators are related but different operators. A filter performs some transformations on its input, and a validator returns a boolean result - whether its input meets certain requirements. They are often used at the same time in application code, and it is important to have filters and validators organized in such a way as to be readily available to various parts of the application, such as processing form data and escaping data for various media (e.g., HTML, RDBMS).

Special Thanks

Thanks to everyone who has helped with Zend_Filter to date, and the following people are recognized for their extraordinarily valuable contributions along the way:

Forgot about you? So sorry; please remind me!

2. References

3. Component Requirements, Constraints, and Acceptance Criteria

  • The component must be simple to use.
  • The component must be extensible. Users must be able to easily write and use filters and validators for their own purposes and publish those filters and validators for others to plug into their framework-powered applications.
  • Filters and validators must be readily available throughout the lifetime of a request to the application. Input filtering and validation, for example, may be used early in the controller execution, whereas escaping output should be done as late as possible, closer to the view logic.
  • Filters and validators are generally considered for inclusion with the framework component where the filter or validator helps the component meet the 80/20 rule with respect to solving common use cases.
  • Filter and validator configuration and setup must be supported by an object-oriented syntax (e.g., using fluent interfaces).
  • Filters and validators as a general rule do not throw exceptions. Filters and validators may throw exceptions, however, only when filtering or validation is reasonably impossible. Validators return true or false, depending on whether the input meets the validation criteria, and, in the event that the input fails validation, messages are accumulated and retained for programmatic access by the consuming application.
  • Ordered filter and validator chaining must be supported.
  • Validator chains must support validation rule dependencies, aborting execution when certain validators return false. Put another way, if there are two validators in a chain, and the second validator need not run if the input fails the first validation, then the chain must allow the developer to terminate the chain execution if the first validator returns false.
  • PHP editor code completion should be supported as much as possible under the proposed design.
  • Overloading with magic functions (e.g., __get(), __call()) should be avoided where possible, and all such usage must be documented well for ease of use.

4. Dependencies on Other Framework Components

  • Zend_Exception

5. Theory of Operation

Organizing Filters and Validators into Classes

Monolithic Class

In this approach, all filters and validators are lumped into a single, monolithic class, whereby the class essentially acts as a library of filtering and validation functions. The current Zend_Filter component in the framework core realizes this approach.

Category Classes

This approach divides the class of the monolithic approach into classes that group filters and validators by purpose. Filters and validators may be categorized into multiple classes based on their general purpose. For example, we might have a String class that provides dozens of filtering and validation methods within its scope. A Number class could be used to provide methods for filtering and validating numeric input. As another example, we could have a Location class that is used to provide methods for dealing with address or location information (e.g., city, country, postal code, latitude and longitude).

Atomic Classes

This approach reduces the class size to contain one filter, one validator, or both. A class that needs to filter would implement a documented filter interface. If a class needs to validate, it would implement a documented validator interface. Because PHP allows for a class to implement multiple interfaces, a class could be both a filter and a validator. Some examples might include IntegerRange, RegExp, and CreditCard.

Choice Comparisons

Criterion Monolithic Class Category Classes Atomic Classes
Filter implements a documented interface resort to documented conventions resort to documented conventions; possibly implement an interface implement an interface
Validator implements a documented interface resort to documented conventions resort to documented conventions; possibly implement an interface implement an interface
Extensibility user classes inherit unnecessary complexity user classes must conform to well-documented conventions, rather than conforming to simple, well-documented interfaces; higher complexity than Atomic approach user classes implement simple, well-documented interfaces
Class Complexity high medium low
Number of Classes low medium high
Naming Conventions Complexity added complexity of having to categorize among various classes organization of classes needed
Ease of Use conventions add some complexity organization of classes needed
Use-at-will Architecture unavailable category classes can be loaded at will, though some may contain unnecessary weight all filters and validators can be loaded at will
Chaining Filters & Validators requires conventions requires conventions; added complexity of categories chaining is simple because objects implement known interfaces

Summary

As the above comparison table indicates, the monolithic class approach has the most trouble adequately meeting the component criteria.

The category class approach improves upon having a monolithic class, though the categorization of the filters and validators introduces its own complexity into the design, particularly because some categories are likely to overlap (i.e., not being mutually exclusive) with other categories. The names of categories are likely to be interpreted in different ways by the user community, perhaps causing additional confusion.

Of the three approaches, the atomic class approach seems to meet the criteria the best, with one caveat. Because the number of classes increases proportionally to the number of filters and validators, and because we need to maintain a reasonable number of classes in each directory, it may be necessary to group or categorize the classes using the normal PEAR conventions already in use throughout the framework. This implies that some additional complexity due to such grouping would be necessary, but this is not the same complexity that is introduced in the category class approach, wherein each class contains many filters and validators that correspond to a single category.

6. Milestones / Tasks

  1. Publish design notes - DONE
  2. Publish proposal based on design notes - DONE
  3. Arrive on proposal approvable for incubator development
  4. Commit working prototype to incubator
  5. Commit passing unit tests
  6. Write initial documentation
  7. Revise code, tests, and docs based on feedback
  8. Merge changes with trunk for core release

7. Class Index

  • Zend_Filter
  • Zend_Filter_Exception
  • Zend_Filter_Interface
  • Zend_Filter_HtmlEntities
  • Zend_Filter_StringToLower
  • Zend_Filter_StringTrim
  • Zend_Filter_StripTags
  • Zend_Validate_Exception
  • Zend_Validate_Interface
  • Zend_Validate_EmailAddress
  • Zend_Validate_StringLength
  • (additional classes from porting existing Zend_Filter methods omitted for brevity)

8. Use Cases

9. Class Skeletons

]]></ac:plain-text-body></ac:macro>

]]></ac:plain-text-body></ac:macro>

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Dec 30, 2006

    <p>A formatting request: the width of the tabs in your deck of use cases causes the whole page to become wider than my browser window and I have to scroll horizontally while reading the rest of the text. May I ask that your deck of use cases be split into multiple decks, each limited to four horizontally?</p>

  2. Dec 30, 2006

    <p>I'm in favor of the Atomic class design. Creating a single class per filter or per validator gives us the most extensible OO design. This design allows users to easily add their own filters and validators. It allows an application to load only those filters and validators it needs, in a lazy-loading fashion, without needing to load extra code the app doesn't need. And making the filters and validators implement interfaces is a good way to provide structure for people to write their own filters and validators.</p>

    <p>But the design of the individual filters and validators is only half of the solution. We also need some kind of "runner" class that applies filtering and validation against a collection of inputs. </p>

    <p>Think of the relationship between the runner and the individual filters/validators as analogous to the relationship between a PHPUnit_Framework_TestSuite and PHPUnit_Framework_TestCase. The former is a collection of the latter; it can run them all in sequence and report the results.</p>

    <p>For example, my use case would be to apply Zend_Filter against the whole $_GET array, after specifying which filters and validators apply to each request parameter. Then after Zend_Filter has done its analysis, it would provide access to filtered data, it would report which parameters failed validation, etc. If all the validation passes, my script can use those inputs freely.</p>

    <p>The runner class can also do other tasks, like provide getter methods for the validated input values, after applying another filter to escape them for different contexts like HTML output or SQL interpolation.</p>

    <p>The runner class may also have an option to throw exceptions and return user-defined error messages when inputs cannot be validated.</p>

  3. Jan 09, 2007

    <p>Wouldn't Zend_Validator_EmailAddress be more appropriate than Zend_Filter_EmailAddress (unless you intend to add filtering capabilities to this class)?</p>

    1. Jan 09, 2007

      <p>Yes, I agree - I haven't quite worked out the class naming scheme. Since we've decided to continue having the Zend_Filter component contain both filter and validator functionality, however, such a name would result in the following file:</p>

      <p>Zend/Validator/EmailAddress.php, which would not be contained in the Zend_Filter component directory (i.e., Zend/Filter).</p>

      <p>How might we best deal with this issue?</p>

      1. Jan 09, 2007

        <p>Well, we could have Zend_Filter and Zend_Validator, but that would result in having two classes that do basically the same thing in very similar ways (i.e. run through a list of rules and apply them). Or we change the component's name to Zend_FilterValidator.</p>

        1. Jan 09, 2007

          <p>Another idea: How about Zend_Filter_Rule_EmailAddress?</p>

      2. Jan 12, 2007

        <p>We have decided to use two different top-level components, at least for the time being, Zend_Filter and Zend_Validate, since they do different things and mixing the two seems to make little sense at best, though it is acknowledged that filtering and validating often take place in the same parts of an application.</p>

  4. Jan 09, 2007

    <p>A couple other questions:</p>
    <ol>
    <li>How are we going to add error messages to the validators? Through the constructor? Something like setMessages()?</li>
    <li>Have to given any thought to using <a href="http://www.php.net/filter">ext/filter</a> somehow?</li>
    </ol>

    1. Jan 12, 2007

      <p>You won't need to add messages to validators. The validators create any necessary messages upon failure of one or more validation requirements, and the user need only retrieve them with getMessages(). For example, an e-mail validator might invalidate an e-mail address for two reasons - the username does not follow RFC 2822, and the domain name has no MX record.</p>

      <p>Yes, we are considering ext/filter, but integration with it may be risky and complicated by the improvements expected in 5.2.1... I'm taking a "wait and see" approach at this time.</p>

      1. Jan 12, 2007

        <p>Of course, application developers can always wrap such validations with their own application-specific messages to generate for their users, but this is not specific to the current validator design.</p>

        1. Jan 12, 2007

          <p>Subclassing Zend_Valid_EmailAddress just because I don't like the way your error messages are worded seems a little excessive, doesn't it?</p>

      2. Jan 12, 2007

        <p>But is it possible to add messages to validators without subclassing the rules? I can think of at least two use cases when I would like to change any default message:</p>

        <p>1. I need a localized message<br />
        2. I don't like the default message</p>

        <p>Thanks for clarification</p>

        1. Jan 12, 2007

          <p>Subclassing any framework-provided validator is not necessary in order to achieve customized error messages for your users.</p>

          1. Jan 12, 2007

            <p>Could you give us an example of how we would do it, then? Because I'm a little confused. It seems that all error messages are generated by each validator internally with no way of injecting our own (e.g. via a setMessages() method). Are you suggesting that we take the output of Zend_Valid_Interface::getMessages() and customize that? If so, then I suggest introducing error message codes (much like exception codes) so that if, say, I didn't like the way the "username does not follow RFC 2822" error message in Zend_Valid_EmailAddress it worded, I don't have to parse every one of the error messages to find it. Zend_Valid_Interface::getMessages() would then return an associative array with the keys being error codes and the values being the associated messages.</p>

            1. Jan 13, 2007

              <p>An example:</p>

              <ac:macro ac:name="code"><ac:plain-text-body><![CDATA[
              <?php
              require_once 'Zend/Validate/EmailAddress.php';
              $validator = new Zend_Validate_EmailAddress();
              if (!$validator->isValid($someEmail)) {
              // code your own error message, ignoring getMessages() if desired
              }
              ]]></ac:plain-text-body></ac:macro>

              <p>Of course, the above example only cares whether the validator succeeded or not. It doesn't address how to override specific messages, but another solution, such as providing integer identifiers for each message, would support this. For example, each message could be located in the array like:</p>

              <ac:macro ac:name="code"><ac:plain-text-body><![CDATA[
              array(
              1 => 'the user portion contains disallowed characters',
              2 => 'the domain name does not exist',
              3 => 'the domain name has no MX record'
              );
              ]]></ac:plain-text-body></ac:macro>

              1. Jan 13, 2007

                <p>Thank you for clearing this up for me.</p>

  5. Jan 10, 2007

    <p>A couple of thoughts. I am not sure how the Filters and Rules would work together in one chain. It probably would complicate it a little. Separate FilterChain and Validator classes is a better design in my opinion because they can be used together or separately. You could always composite both together in a class that did both, but you can't separate them once combined.</p>

    <p>I would also keep the Filters and Rules in their own directories. They are handy things in their own right and can be used stand-alone in many cases. They are also handy for dealing with other things than the Request and I believe that something similar to the design I proposed is being used elsewhere in the framework already. Many ZF subsystems could probably make use of a well defined interface for Rules and Filters. </p>

    <p>Finally it is important to be clear conceptually about the difference between Filters and Rules. Filters are designed to modify data passed through them – usually in some standard container. Rules on the other hand are designed to return some sort of error information – usually an error string. So their supporting chain/managers have slightly different duties. </p>

    1. Jan 12, 2007

      <p>Agreed; the code in the laboratory should reflect this now:</p>

      <p><a class="external-link" href="http://framework.zend.com/fisheye/browse/Zend_Framework_Laboratory/Zend_Filter/library/Zend">http://framework.zend.com/fisheye/browse/Zend_Framework_Laboratory/Zend_Filter/library/Zend</a></p>

  6. Jan 15, 2007

    <p>The new design feels much better, however I am having some agony over the number of lines of code required to perform a "simple" validation. It feels clunky to have to create a new instance of every individual validator - especially when you would often only use a given validator once per form item.</p>

    <p>Perhaps there should be some sort of static method in Zend_Valid/Zend_Filter that transparently instantiates the validator and returns the message iterator?</p>

    1. Jan 15, 2007

      <p>Yes, I've thought of including a factory method with each of <code>Zend_Filter</code> and <code>Zend_Validate</code>. That is, instead of:</p>

      <ac:macro ac:name="code"><ac:plain-text-body><![CDATA[
      require_once 'Zend/Filter/HtmlEntities.php';
      $filter = new Zend_Filter_HtmlEntities(ENT_QUOTES);
      echo $filter->filter($someValue);
      ]]></ac:plain-text-body></ac:macro>

      <p>you could write:</p>

      <ac:macro ac:name="code"><ac:plain-text-body><![CDATA[
      require_once 'Zend/Filter.php';
      echo Zend_Filter::createInstance('HtmlEntities', ENT_QUOTES)->filter($someValue);
      ]]></ac:plain-text-body></ac:macro>

      <p>The disadvantages to this approach are that it does not support automatic editor code completion of the filter name or of the filter options that are passed to the constructor by the factory. From this perspective, the factory seems to offer little value other than joining the line for instantiation and the line for using a class member.</p>

      <p>I could include a factory if it is convenient, though official policy may be that factories may not be used in the framework for convenience only. If we were to add a factory, I believe that this factory should support instantiating user-defined classes located in various directories (like View helper classes). The existing design already supports using user-defined classes that conform to the interfaces, so any factory that we include should do so, too.</p>

      <p>At this point, I tend to think that the cost of instantiating an object is worth the flexibility offered over a static implementation, especially considering the advantages offered to consuming code by having filter and validator objects that implement common interfaces.</p>

      <p>How else would you propose reducing the number of lines of code to perform a simple validation?</p>

      1. Jan 16, 2007

        <p>Another "code compression":</p>

        <ac:macro ac:name="code"><ac:plain-text-body><![CDATA[
        require_once 'Zend/Filter.php';
        echo Zend_Filter::filter($someValue, 'HtmlEntities', ENT_QUOTES);
        ]]></ac:plain-text-body></ac:macro>

        <p>This has the same disadvantages as my previous example, but is more compact.</p>

  7. Jan 16, 2007

    <ac:macro ac:name="unmigrated-wiki-markup"><ac:plain-text-body><![CDATA[To comrpess code, why not Having something like ? 

    <?php
    require_once 'Zend/Valid.php';
    $validator = new Zend_Valid();
    $validator->addValidator('StringLength', $my_string, array(3, 64))
                ->addValidator('EmailAddress', $email)
                ->addValidator(VALIDATION_TYPE, $my_var);
    if ($validator->isValid()) {
    echo 'valid';
    }

    else {
    foreach ($validator->getMessages() as $message)

    Unknown macro: { echo "$messagen"; }

    }

    With Zend_Valid wich autoload required class to valid the variable....]]></ac:plain-text-body></ac:macro>

    1. Jan 16, 2007

      <p>Yes, this is the same concept as described above but applied to the Zend_Valid[ate] class, whereby an internal factory provides for code brevity. This offering has the same disadvantages with respect to supporting automatic editor code completion on [filter/]validator:</p>

      <ul>
      <li>Names or identifiers (e.g., 'HtmlEntities')</li>
      <li>Options (e.g., passing 3 and 64 to StringLength)</li>
      </ul>

      <p>If we add this type of functionality, without removing the ability to instantiate concrete objects when desirable, we gain some convenience without sacrificing code completion.</p>

      <p>The demand for this convenience feature seems to be quite high. It provides for shorter code that may be easier to read than the concrete instantiation alternative. The feature costs little but provides for ease of use.</p>

      <p>I think that providing this will help meet the requirement of being simple and convenient to declare chains of filters and validators. It doesn't seem to be disallowed by <a href="http://framework.zend.com/wiki/x/WRk">our documentation</a>, but maybe I've missed something.</p>

      <p>I support it, and I'll add it unless directed otherwise.</p>

      1. Jan 16, 2007

        <p>I should also note that such low-level factories may be made irrelevant and/or transparent to the end user due to future developments in the required "runner" solution for various use cases - such as filtering input (through the Request object?), escaping output (integration with Views?), and building form controllers (Zend_Form?) - where the developer needs to attach filters and validators to a data source.</p>

  8. Jan 16, 2007

    <ac:macro ac:name="note"><ac:parameter ac:name="title">Official Zend Comment</ac:parameter><ac:rich-text-body>
    <p>This proposal is approved to be committed to the incubator.</p>

    <p>The solution described in the proposal above is adequate to provide functions for filtering and validating, but the full solution must include some "runner" solution.</p>

    <p>The requirement is to make it simple and convenient to declare chains of filters and validators, and to declare which inputs to which these filter/validator chains apply.</p>

    <p>So this proposal is approved for the incubator, on the condition that this requirement will be met. The design of the eventual solution is still a subject for discussion. </p></ac:rich-text-body></ac:macro>

  9. Jan 19, 2007

    <p>Darby - could you please explain in a bit more detail how a "runner" solution may be implemented. I can't quite get my head around it!</p>