View Source

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[{zone-template-instance:ZFDEV:Zend Proposal Zone Template}



{zone-data:component-name}
Zend_Filter Design
{zone-data}



{zone-data:proposer-list}
[Darby Felton|mailto:darby@zend.com], author & Zend liaison
{zone-data}



{zone-data:revision}
0.3.0 - 10 Jan 2007: Resolved problems with dual-purpose component design and class naming schema
{zone-data}



{zone-data:overview}
h2. Introduction


h3. What is a filter?

In the real world, a filter is typically used for removing unwanted portions of input, and the desired portion of the input passes through as filter output (e.g., coffee). In such scenarios, a filter is an operator that produces a subset of the input. This type of filtering is useful for web applications - removing illegal input, trimming unnecessary white space, etc.

The idea of filtering may also be extended to performing generalized transformations upon input, more than simply acting as an operator to produce a selected subset of the input. A common transformation applied in web applications is the escaping of HTML entities. For example, if a form field should have its value populated with a dynamic value, this value should either be free of HTML entities or contain only escaped HTML entities, in order to prevent undesired behavior and security vulnerabilities. To meet this requirement we can either remove HTML entities or escape them, and other approaches may more appropriate for different situations. A filter that removes the HTML entities operates within the scope of the first definition of filter - an operator that produces a subset of the input. A filter that escapes the HTML entities, however, transforms the input. If we consider such use cases within the scope of filtering, then filtering must be redefined as an operator that performs transformations upon input.


h3. What is a validator?

A validator examines its input with respect to some requirements and produces a boolean result - whether the input successfully validates against the requirements. If the input does not meet the requirements, a validator may additionally provide information about which requirement(s) the input does not meet.

For example, a web application might require that a username be between six and twelve characters in length and may only contain alphanumeric characters. A validator can be used for ensuring that usernames meet these requirements. If a chosen username does not meet one or both of the requirements, it would be useful to know which of the requirements the username fails to meet.

A validator differs from a filter primarily in that a validator produces a boolean result, based on whether the input validates against some requirements. A filter, on the other hand, produces output that is based on transformations upon the input.


h3. Why consider filters and validators together?

Though filters and validators perform different operations upon their input, it is easy to see how filters and validators are often used together in web applications.

Consider the use case of a blog application that accepts comments upon its entries. Let us assume that the comment form contains two fields - an e-mail address field and a multi-line comment text area.

Let us further assume that a submitted comment containing an e-mail address that does not conform to the e-mail address schema should be rejected (e.g., not@realaddress). A validator would be used to determine whether an e-mail address is acceptable.

Since an accepted comment would appear on the associated article's web page, comment authors should be unable to perform cross-site scripting (XSS). A filter would be used to remove XSS-capable content or transform it such that XSS cannot be performed.

Because the e-mail validator and comment filter would be operating upon different parts of the same form input, they should be readily available to the same parts of the web application code. Thus, filters and validators are often used together in applications.

Filters and validators may also be related further than the proximity of their use in web application code. Consider a user-entered phone number as \[part of\] the application input. The phone number (555) 555-1234 may be written in several ways, such as 555.555.1234 and 555-555-1234, all of which should be considered valid. Though there are other ways to solve this problem, let us assume that the application will strip all punctuation and white space from the input, normalizing the phone number to contain only numbers. Once the normalized phone number is available from the filter, the application requires that the phone number be ten digits, and a simple validator can be used for meeting this requirement. Thus, for a single datum, the input phone number, the application must use a filter and then a validator. The filter and validator for a phone number should be readily and equally available to the application, perhaps even contained in the same class.


h3. Zend_Filter already provides filters and validators. Why revisit these topics now?

The Zend_Filter component currently provides a library of static filter and validator methods via a monolithic {{Zend_Filter}} class.

A method of {{Zend_Filter}} is a filter or validator by naming convention only, and filter methods are further divided into whitelist and blacklist filters, again only by naming convention.

The Zend_Filter component includes another class, {{Zend_Filter_Input}}, that allows instantiation with an array of input data. The methods of {{Zend_Filter_Input}} generally act as proxies to the static methods of {{Zend_Filter}}, passing along a specific element of the input array to the proxied methods.

By using Zend_Filter an application gains only the filters and validators provided by the class, nothing more. The size of Zend_Filter increases proportional to the number of filters and validators. A class having too many methods is confusing and difficult to use, and the current {{Zend_Filter}} class cannot mitigate this risk except through potentially complicated naming conventions, which still does not reduce the number of methods in the class.

The only way for users to add functionality to Zend_Filter is by extending the {{Zend_Filter}} and {{Zend_Filter_Input}} classes. By doing so, a user class inherits all the filters and validators of each inherited class, even if only one or two methods are needed. This approach also does not support clean integration of filters and validators from third-party providers.

Complex filters and validators that require supporting class constants, instance variables, and/or helper methods are all contained in a single class. As the number of filters and validators grows, the complexity of Zend_Filter and its naming conventions grows proportionally.

Zend_Filter supports ordering of filters and validators only by the order in which the methods are called from user code. Though this may be enough, it may also be worthwhile to consider providing additional support for managing filters and validators. For example, if an ordered set of filters and/or validators are to be applied to several input data, it may be useful to have a container for the filters and/or validators that can operate on multiple input data and provide aggregated results to the user. {{Zend_Filter_Input}} provides some degree of support for this idea, but its limitations may hinder users.

In order to solve these limitations of the current Zend_Filter design, we revisit these topics in order to improve the component's design with respect to two key framework goals - extremely simple (to use) and use-at-will architecture (flexibility and extensibility).


h3. Summary

Filters and validators are related but different operators. A filter performs some transformations on its input, and a validator returns a boolean result - whether its input meets certain requirements. They are often used at the same time in application code, and it is important to have filters and validators organized in such a way as to be readily available to various parts of the application, such as processing form data and escaping data for various media (e.g., HTML, RDBMS).



h2. Special Thanks

Thanks to everyone who has helped with Zend_Filter to date, and the following people are recognized for their extraordinarily valuable contributions along the way:

* [~andi]
* [~chris]
* [~christopher]
* [~gaylord]
* [~kevin]
* Marco Tabini

_Forgot about you? So sorry; please [remind me|mailto:darby@zend.com]!_
{zone-data}



{zone-data:references}
* [Zend_FilterChain Zend_Validator Component Proposal|http://framework.zend.com/wiki/x/cwM]
* [Intercepting Filters Component Proposal|http://framework.zend.com/wiki/x/jwc]
* Symfony
** [Filters - symfony advent calendar day eighteen|http://www.symfony-project.com/askeet/18]
** [How to validate a form|http://www.symfony-project.com/book/trunk/validate_form]
* [PHP Filter Extension|http://www.php.net/manual/en/ref.filter.php]
{zone-data}



{zone-data:requirements}
* The component must be simple to use.
* The component must be extensible. Users must be able to easily write and use filters and validators for their own purposes and publish those filters and validators for others to plug into their framework-powered applications.
* Filters and validators must be readily available throughout the lifetime of a request to the application. Input filtering and validation, for example, may be used early in the controller execution, whereas escaping output should be done as late as possible, closer to the view logic.
* Filters and validators are generally considered for inclusion with the framework component where the filter or validator helps the component meet the 80/20 rule with respect to solving common use cases.
* Filter and validator configuration and setup must be supported by an object-oriented syntax (e.g., using fluent interfaces).
* Filters and validators as a general rule do not throw exceptions. Filters and validators may throw exceptions, however, only when filtering or validation is reasonably impossible. Validators return true or false, depending on whether the input meets the validation criteria, and, in the event that the input fails validation, messages are accumulated and retained for programmatic access by the consuming application.
* Ordered filter and validator chaining must be supported.
* Validator chains must support validation rule dependencies, aborting execution when certain validators return false. Put another way, if there are two validators in a chain, and the second validator need not run if the input fails the first validation, then the chain must allow the developer to terminate the chain execution if the first validator returns false.
* PHP editor code completion should be supported as much as possible under the proposed design.
* Overloading with magic functions (e.g., {{\_\_get()}}, {{\_\_call()}}) should be avoided where possible, and all such usage must be documented well for ease of use.
{zone-data}



{zone-data:dependencies}
* Zend_Exception
{zone-data}



{zone-data:operation}
h2. Organizing Filters and Validators into Classes


h3. Monolithic Class

In this approach, all filters and validators are lumped into a single, monolithic class, whereby the class essentially acts as a library of filtering and validation functions. The current Zend_Filter component in the framework core realizes this approach.


h3. Category Classes

This approach divides the class of the monolithic approach into classes that group filters and validators by purpose. Filters and validators may be categorized into multiple classes based on their general purpose. For example, we might have a String class that provides dozens of filtering and validation methods within its scope. A Number class could be used to provide methods for filtering and validating numeric input. As another example, we could have a Location class that is used to provide methods for dealing with address or location information (e.g., city, country, postal code, latitude and longitude).


h3. Atomic Classes

This approach reduces the class size to contain one filter, one validator, or both. A class that needs to filter would implement a documented filter interface. If a class needs to validate, it would implement a documented validator interface. Because PHP allows for a class to implement multiple interfaces, a class could be both a filter and a validator. Some examples might include IntegerRange, RegExp, and CreditCard.


h3. Choice Comparisons

||Criterion||Monolithic Class||Category Classes||Atomic Classes||
|Filter implements a documented interface|(x) resort to documented conventions|(!) resort to documented conventions; possibly implement an interface|(/) implement an interface|
|Validator implements a documented interface|(x) resort to documented conventions|(!) resort to documented conventions; possibly implement an interface|(/) implement an interface|
|Extensibility|(x) user classes inherit unnecessary complexity|(!) user classes must conform to well-documented conventions, rather than conforming to simple, well-documented interfaces; higher complexity than Atomic approach|(/) user classes implement simple, well-documented interfaces|
|Class Complexity|(x) high|(!) medium|(/) low|
|Number of Classes|(/) low|(!) medium|(x) high|
|Naming Conventions Complexity|(x)|(!) added complexity of having to categorize among various classes|(!) organization of classes needed|
|Ease of Use|(x)|(!) conventions add some complexity|(!) organization of classes needed|
|Use-at-will Architecture|(x) unavailable|(!) category classes can be loaded at will, though some may contain unnecessary weight|(/) all filters and validators can be loaded at will|
|Chaining Filters & Validators|(!) requires conventions|(!) requires conventions; added complexity of categories|(/) chaining is simple because objects implement known interfaces|


h3. Summary

As the above comparison table indicates, the monolithic class approach has the most trouble adequately meeting the component criteria.

The category class approach improves upon having a monolithic class, though the categorization of the filters and validators introduces its own complexity into the design, particularly because some categories are likely to overlap (i.e., not being mutually exclusive) with other categories. The names of categories are likely to be interpreted in different ways by the user community, perhaps causing additional confusion.

Of the three approaches, the atomic class approach seems to meet the criteria the best, with one caveat. Because the number of classes increases proportionally to the number of filters and validators, and because we need to maintain a reasonable number of classes in each directory, it may be necessary to group or categorize the classes using the normal PEAR conventions already in use throughout the framework. This implies that some additional complexity due to such grouping would be necessary, but this is not the same complexity that is introduced in the category class approach, wherein each class contains many filters and validators that correspond to a single category.
{zone-data}



{zone-data:milestones}
# Publish design notes - *DONE*
# Publish proposal based on design notes - *DONE*
# Arrive on proposal approvable for incubator development
# Commit working prototype to incubator
# Commit passing unit tests
# Write initial documentation
# Revise code, tests, and docs based on feedback
# Merge changes with trunk for core release
{zone-data}



{zone-data:class-list}
* Zend_Filter
* Zend_Filter_Exception
* Zend_Filter_Interface
* Zend_Filter_HtmlEntities
* Zend_Filter_StringToLower
* Zend_Filter_StringTrim
* Zend_Filter_StripTags
* Zend_Validate_Exception
* Zend_Validate_Interface
* Zend_Validate_EmailAddress
* Zend_Validate_StringLength
* (additional classes from porting existing Zend_Filter methods omitted for brevity)
{zone-data}



{zone-data:use-cases}


{composition-setup}


{deck:id=use-cases1}


{card:label=UC 1 : Validate an Email Address}

h2. UC 1 : Validate an Email Address

This example prints "valid" if the e-mail address conforms to the allowed e-mail address schema. If the e-mail address fails validation, the reasons why the validation failed are printed instead:

{code}
<?php
require_once 'Zend/Validate/EmailAddress.php';
$validEmailAddress = new Zend_Validate_EmailAddress();
if ($validEmailAddress->isValid($someEmail)) {
echo "valid\n";
} else {
foreach ($validEmailAddress->getMessages() as $message) {
echo "$message\n";
}
}
{code}

{card}


{card:label=UC 2 : Strip HTML Tags}

h2. UC 2 : Strip HTML Tags

In this example, a developer needs to protect her blog application from cross-site scripting (XSS) attacks, while allowing blog comments to contain presentational markup. Some XSS protection is facilitated by allowing only certain HTML tags and certain attributes of the allowed tags to appear in submitted comments. Assume that the {{<b>}}, {{<i>}}, and {{<a>}} HTML tags will be allowed in comments. Limiting comments to having only these tags, however, does not protect against the attributes that may be used in these tags. For example, the {{style}} attribute and JavaScript event attributes such as {{onMouseOver}} can be used for unintended purposes. For the {{<a>}} tag, let us allow only the {{href}} attribute:

{code}
<?php
require_once 'Zend/Filter/StripTags.php';
$tagsAllowed = array(
'b',
'i',
'a' => 'href'
);
$filterStripTags = new Zend_Filter_StripTags($tagsAllowed);
$commentFiltered = $filterStripTags->filter($comment);
{code}

{card}


{card:label=UC 3 : Escape HTML Entities}

h2. UC 3 : Escape HTML Entities

In this example, the developer needs to print some HTML code on a web page. In order to prevent the browser from attempting to render the HTML code, all of the HTML entities must be escaped for output:

{code}
<?php
require_once 'Zend/Filter/HtmlEntities.php';
$filterHtmlEntities = new Zend_Filter_HtmlEntities(ENT_QUOTES);
$htmlFiltered = $filterHtmlEntities->filter($html);
echo $htmlFiltered;
{code}

{card}


{card:label=UC 4 : Filter Chains}

h2. UC 4 : Filter Chains

In this example we are normalizing a username in the following way:

# Trim the username of any leading and trailing whitespace
# Ensure that the username contains only lowercase characters

{code}
<?php
require_once 'Zend/Filter.php';
require_once 'Zend/Filter/StringTrim.php';
require_once 'Zend/Filter/StringToLower.php';
$filter = new Zend_Filter();
$filter->addFilter(new Zend_Filter_StringTrim())
->addFilter(new Zend_Filter_StringToLower());
$usernameFiltered = $filter->filter($username);
{code}

{card}


{deck}


{deck:id=use-cases2}


{card:label=UC 5 : Validator Chains}

h2. UC 5 : Validator Chains

In this example, the application developer needs to validate an e-mail address. The e-mail address is submitted from a simple form that subscribes the e-mail address to the company newsletter. Though valid e-mail addresses may be of arbitrary length, the database administrator has informed the application developer that the database schema is set up to allow e-mail addresses no longer than 64 characters. Thus, a submitted e-mail address is only to be considered valid if it is 64 characters in length or less and validates against the allowed schema for e-mail addresses.

{code}
<?php
require_once 'Zend/Validate.php';
require_once 'Zend/Validate/StringLength.php';
require_once 'Zend/Validate/EmailAddress.php';
$validator = new Zend_Validate();
$validator->addValidator(new Zend_Validate_StringLength(3, 64))
->addValidator(new Zend_Validate_EmailAddress());
if ($validator->isValid($email)) {
echo 'valid';
} else {
foreach ($validator->getMessages() as $message) {
echo "$message\n";
}
}
{code}

{card}


{card:label=UC 6 : Arbitrary Validation Rules}

h2. UC 6 : Arbitrary Validation Rules

In this example, a user is registering an identity with the web application, and the application requires that the user choose a password. For verification purposes, the user is required to enter the chosen password twice, into two separate form fields. The password is case-sensitive, and the values of the two password fields must be equivalent in order for successful registration to occur. The same approach is often undertaken for validating an e-mail address.

Since Zend_Validate_Interface defines {{isValid()}} to accept a single parameter, it is a simple matter to create a validator that ensures that the passwords are equivalent:

{code}
require_once 'Zend/Validate/Interface.php';
class StringsEqual implements Zend_Validate_Interface
{
protected $_expectedValue;
protected $_messages = array();

public function __construct($expectedValue = '')
{
$this->_expectedValue = $expectedValue;
}

public function isValid($value)
{
if ($value === $this->_expectedValue) {
$this->_messages = array();
return true;
} else {
$this->_messages[] = "'$value' does not match expected value '$this->_expectedValue'";
return false;
}
}

public function getMessages()
{
return $this->_messages;
}
}
{code}

Now that the validator has been written, it is time to use it:

{code}
// Assume that the incoming password fields are received in the request as:
// $_POST['password1']
// $_POST['password2']
$passwordValidator = new StringsEqual($_POST['password1']);
if ($passwordValidator->isValid($_POST['password2'])) {
// success; passwords match
} else {
// failure; passwords do not match
}
{code}

{card}


{deck}


{zone-data}



{zone-data:skeletons}


{composition-setup}


{deck:id=skeletons1}


{card:label=Zend_Filter}

The Zend_Filter class may be used to create filter chains, in which each filter added to the chain is run in the order in which it was added to the chain. Thus, the first filter added to the chain is the first filter run.

{code}
class Zend_Filter
{
public function addFilter(Zend_Filter_Filter_Interface $filter)
{}

public function filter($value)
{}
}
{code}
{card}


{card:label=Zend_Filter_Exception}
{code}
class Zend_Filter_Exception extends Zend_Exception
{}
{code}
{card}


{card:label=Zend_Filter_Interface}
{code}
interface Zend_Filter_Interface
{
/**
* Returns the result of filtering $value
*
* @param mixed $value
* @throws Zend_Filter_Exception If filtering $value is impossible
* @return mixed
*/
public function filter($value);
}
{code}
{card}


{card:label=Zend_Validate}

The Zend_Validate class may be used to create validation chains, in which each validator added to the chain is run in the order in which it was added to the chain. Thus, the first validator added to the chain is the first one run. If {{$breakChainOnFailure}} is set to {{true}} for a particular validator when it is added, then if the validator fails, the validator chain execution is stopped. The default behavior is that all validators are run, regardless of whether or not the input fails any particular validator in the chain.

Zend_Validate aggregates all failure messages from validators into a single array of strings, again in the order in which they are received. It is expected that it is not necessary to maintain for users the association between failure messages and the particular validator that produced them.

{code}
class Zend_Validate
{
public function addValidator(Zend_Validate_Interface $validator, $breakChainOnFailure = false)
{}

public function isValid($value)
{}

public function getMessages()
{}
}
{code}
{card}


{deck}


{deck:id=skeletons2}

Zend_Validate_Interface defines a basic set of operations for common validation use cases. The {{getMessages()}} method is used, in the event that {{isValid()}} returns false, to find out why the input failed validation. In some cases, the input will have failed validation for a number of reasons, and {{getMessages()}} faciliates this by returning an array of string messages that describe why the input failed certain validation constraints.

{card:label=Zend_Validate_Interface}
{code}
interface Zend_Validate_Interface
{
/**
* Returns true if and only if $value meets the validation requirements
*
* If $value fails validation, then this method returns false, and $messages will contain an array of
* messages that explain why the validation failed.
*
* @param mixed $value
* @throws Zend_Validate_Exception If validation of $value is impossible
* @return boolean
*/
public function isValid($value);

/**
* Returns an array of messages that explain why a previous isValid() call returned false
*
* If isValid() was never called or if the most recent isValid() call returned true, then
* this method returns an empty array.
*
* @return array
*/
public function getMessages();
}
{code}
{card}


{card:label=Zend_Validate_Exception}
{code}
class Zend_Validate_Exception extends Zend_Exception
{}
{code}
{card}


{card:label=Zend_Validate_EmailAddress}
{code}
class Zend_Validate_EmailAddress implements Zend_Validate_Interface
{
public function __construct($domainLocalAllowed = false)
{}

public function getDomainLocalAllowed()
{}

public function setDomainLocalAllowed($domainLocalAllowed)
{}

public function isValid($value)
{}

public function getMessages()
{}
}
{code}
{card}


{card:label=Zend_Filter_StripTags}
{code}
class Zend_Filter_StripTags implements Zend_Filter_Interface
{
public function __construct($tagsAllowed = null, $attributesAllowed = null)
{}

public function getTagsAllowed()
{}

public function setTagsAllowed($tagsAllowed)
{}

public function getAttributesAllowed()
{}

public function setAttributesAllowed($attributesAllowed)
{}

public function filter($value)
{}
}
{code}
{card}


{deck}


{deck:id=skeletons3}


{card:label=Zend_Filter_HtmlEntities}
{code}
class Zend_Filter_HtmlEntities implements Zend_Filter_Interface
{
public function __construct($quoteStyle = ENT_COMPAT, $charset = 'ISO-8859-1')
{}

public function getQuoteStyle()
{}

public function setQuoteStyle($quoteStyle)
{}

public function getCharset()
{}

public function setCharset($charset)
{}

public function filter($value)
{}
}
{code}
{card}


{card:label=Zend_Validate_StringLength}
{code}
class Zend_Validate_StringLength implements Zend_Validate_Interface
{
public function __construct($min = 0, $max = null)
{}

public function getMin()
{}

public function setMin($min)
{}

public function getMax()
{}

public function setMax($max)
{}

public function isValid($value)
{}

public function getMessages()
{}
}
{code}
{card}


{card:label=Zend_Filter_StringTrim}
{code}
class Zend_Filter_StringTrim implements Zend_Filter_Interface
{
public function __construct($charList = self::CHAR_LIST_DEFAULT)
{}

public function getCharList()
{}

public function setCharList($charList)
{}

public function filter($value)
{}
}
{code}
{card}


{card:label=Zend_Filter_StringToLower}
{code}
class Zend_Filter_StringToLower implements Zend_Filter_Interface
{
public function filter($value)
{}
}
{code}
{card}


{deck}


{zone-data}



{zone-template-instance}]]></ac:plain-text-body></ac:macro>