Added by Bill Karwin, last edited by Karol Babioch on Mar 04, 2008  (view change)

Labels

 
(None)

Zend Framework: Zend_Filter_Input Component Proposal

Proposed Component Name Zend_Filter_Input
Developer Notes http://framework.zend.com/wiki/display/ZFDEV/Zend_Filter_Input
Proposers Bill Karwin
Darby Felton
Revision 1.1 - 10 April 2007: initial writeup. (wiki revision: 17)

Table of Contents

1. Overview

This is a proposed solution to apply multiple Zend_Filter and Zend_Validate actions to multiple inputs, e.g. $_GET or $_POST.

2. References

3. Component Requirements, Constraints, and Acceptance Criteria

  • This component will create an object interface to filtering and validating.
  • This component will accept an array of inputs in the form of an associative array, e.g. $_GET or $_POST superglobals.
  • This component will allow a developer to specify a single filter or validator in a scalar.
  • This component will allow a developer to specify multiple filters and multiple validators to apply to a given input field.
  • This component will load, instantiate, and invoke the Filter or Validate objects.
  • This component will allow a developer to declare additional constraints, such as required fields.
  • This component will return filtered and validated field values in escaped format using accessors.
  • This component will have a magic _get() accessor that returns the value in an escaped format, after it has been filtered and validated.
  • This component will support user-defined Filter and Validate classes in namespaces other than Zend_Filter and Zend_Validate.

4. Dependencies on Other Framework Components

  • Zend_Filter_Interface
  • Zend_Filter_Exception
  • Zend_Filter (filter chain class)
  • Zend_Validate_Interface
  • Zend_Validate_Exception
  • Zend_Validate (validator chain class)
  • other concrete classes that implement Zend_Filter_Interface and Zend_Validate_Interface

5. Theory of Operation

Creating an instance of Zend_Filter_Input involves declaring an array of filters and validators to apply to data fields by name. This associative array maps from a field name to a filter (or validator), or a chain of filters (or validators). In the example below, the field 'month' will be filtered by Zend_Filter_Digits and then by Zend_Filter_StringTrim.

The key of the array above is the name of the field to which to apply the filters. The value can be a scalar if one filter is desired, or an array if a chain of multiple filters is desired. Each value can be a string, which is mapped to a class name, or else an instance of an object that implements Zend_Filter_Interface. In the example below, the field 'month' will be filtered by Zend_Filter_Digits and then by Zend_Filter_StringTrim.

Integer-indexed elements of the value array correspond to validators. String-indexed elements of the value array specify metacommands. For instance, if the key of the $filters array is not the same as the name of the field, you can specify it:

The $validators array is similar to the $filters array. The validator has an additional metacommand called 'presence'. If its value is 'required' then if the field is not present it is reported as a missing field. Fields that are not declared in the validator array at all but appear in the input are reported as an unknown field. Fields that don't pass their validation are reported as an invalid field.

To create an object of Zend_Filter_Input, pass array arguments for the $filters and $validators declarations.

You can add data either as the third argument to the constructor, or with the setData() method.

If you have user-defined filter or validator classes that don't exist in the Zend_Filter or Zend_Validate namespace, you can add more namespaces in the ctor options or with the addNamespace() method. You cannot remove Zend_Filter and Zend_Validate as namespaces, you can only add namespaces. User-defined namespaces are searched first, Zend namespaces are searched last.

Filters are applied before validators. Don't declare filters intended for escaping output in the $filters array. It could make the validators' job awkward. There is an opportunity to add a filter to escape output that runs after the validators.

After the filters and validators are done, you can get reports of missing, unknown, and invalid fields

You can get field values in escaped format using the magic accessor. There are non-magic accessor methods for getting the field values in escaped or unescaped format.

The default filter used to escape output is Zend_Filter_HtmlEntities. You can specify a different filter for escaping output. You can use the options array in the constructor, or else setDefaultEscapeFilter(). You can specify this filter as a string or as an object in either case.

If you want more than one escaping filter available simultaneously in a single instance of Zend_Filter_Input, you should subclass Zend_Filter_Input and implement a new method to get values in a different escaped format.

6. Milestones / Tasks

  • Milestone 1: Post prototype design, gather community feedback
  • Milestone 2: Working prototype checked into the incubator supporting use cases
  • Milestone 3: Unit tests exist, work, and are checked into SVN.
  • Milestone 4: Write documentation.

If a milestone is already done, begin the description with "[DONE]", like this:

  • Milestone #: [DONE] Unit tests ...

7. Class Index

  • Zend_Filter_Input

8. Use Cases

UC-A

General usage.

UC-B

Specify default escape filter.

UC-C

Specify custom namespace to find user-defined filter and validator classes.

UC-01

For comparison to the Zend_Validate_Builder proposal, below is example code using Zend_Filter_Input that implement the solution described in UC-01 given in that proposal.

Basic example:

UC-02

For comparison to the Zend_Validate_Builder proposal, below is example code using Zend_Filter_Input that implement the solution described in UC-02 given in that proposal.

Globbing:

Grouping; Zend_Filter_Input doesn't do grouping as Zend_Validate_Builder does, so this is how one would have to achieve the same result:

UC-03

For comparison to the Zend_Validate_Builder proposal, below is example code using Zend_Filter_Input that implement the solution described in UC-03 given in that proposal.

Password confirmation:

Captcha validation:

UC-04

For comparison to the Zend_Validate_Builder proposal, below is example code using Zend_Filter_Input that implement the solution described in UC-04 given in that proposal.

UC-05

For comparison to the Zend_Validate_Builder proposal, below is example code using Zend_Filter_Input that implement the solution described in UC-05 given in that proposal.

UC-06

For comparison to the Zend_Validate_Builder proposal, below is example code using Zend_Filter_Input that implement the solution described in UC-06 given in that proposal.

UC-07

For comparison to the Zend_Validate_Builder proposal, below is example code using Zend_Filter_Input that implement the solution described in UC-07 given in that proposal.

UC-08

For comparison to the Zend_Validate_Builder proposal, below is example code using Zend_Filter_Input that implement the solution described in UC-08 given in that proposal.

UC-09

For comparison to the Zend_Validate_Builder proposal, below is example code using Zend_Filter_Input that implement the solution described in UC-09 given in that proposal.

9. Class Skeletons

Zend_Filter_Input

Very nice proposal and exactly what is needed in my opinion. I was working on sometime similar for my own use but then much simpler. Intended to suit my needs before an official component would be released. I implemented your solution already for my current application to see how it works

It works great and I really hope this proposal will be included in the version 1 release.

Because really this is what Zend Framework is really missing at the moment.

In some cases you will have a field that is not a requirement but if it is filled you want to validate it against a set of rules. But if the field is empty it should also give true on isValid().

Any plans to implement this ??

Do you mean cases where the field does not appear in the $_REQUEST array, or where it does appear in the $_REQUEST array but has empty string as a value?

If the former, it's easy, just declare the validator but don't declare 'presence'=>'required' for that field.

If the latter, it's a bit harder. Validator chaining supports the equivalent of "AND" of all the validators in the chain, but not "OR". That is, all the validators in the chain must return true for the chain to return true. If you need "OR" combinations, you should write your own custom validator class, which accepts an empty string or some other validation condition.

Complex cases are better handled in your app, after the declarative validation. That is, you could use Zend_Filter_Input to process all the simpler cases of input, and for the complex cases, don't do the validation conditions in the declarative manner. Instead, get the field value after the simple validation of other fields is done, and do your custom conditions in your code.

There's no way that a class like this can handle every use case under the sun. Its purpose is to make it very easy to do the 80% common cases.

I do mean the latter.

I was thinking of an option like required but than optional. If optional and field is empty just to skip creating the chain and validating against it.

I do think it would fit in your module very well and serves a lot of other people to I believe.

I think allowEmpty is orthogonal to whether the field is required or not.

That is, a field can be optional or required, and independent from that, validation of that field can allow empty strings or not allow empty strings. So I think it needs a separate metacommand:

But I think it's a fine idea to allow empty strings to be valid.

Zend Comment

After discussion from Zend Framework team, this component is approved for development in the incubator.

Approval is contingent on developing solutions for the following use cases:

  • Optionally allow fields to be considered valid if they are zero-length strings, if the metacommand 'allowEmpty'=>true is set in a rule.
  • Change the 'field' metacommand to 'fields' because the chief use case of this metacommand is when you want to send multiple fields as an array to one validator.
  • Offer a public method process() that throws an exception if the object has any invalid or missing required fields.
  • Add a metacommand 'escapeFilter' at the rule level, which specifies a filter to use per field, to override the default escape filter.
  • Make sure the escape filter, whether default or field-specific, can be a filter chain as well as a single filter.
  • Make sure multi-value fields (e.g. checkboxes) are filtered or validated iteratively against the respective filter or validator chains.
  • Make sure multi-value fields are escaped iteratively by the getEscaped() method escapes.
  • Add a special rule key '*' which applies to all fields in either the filters or validators arrays. Apply wildcard filters before field-specific filters. Apply wildcard validators before field-specific validators.

glad to see this component moving to the incubator. And I'm happy to hear about the metacommand allowEmpty.

The metacommand 'escapefilter' is also a very welcome addition. Great work so far

Something that is missing is the ability to determine which validator failed in the case of multiple validation rules. The programmer could then use that information to send back to the user a more informative error message.

For example:
If we use the code sample above where "month" has the validation rules: "digits" and "between 1-12" defined. If the data passed is "a", I'd like to know which validator failed validation ('digits', between').

There seems to be two solutions that stand out. One solution was in Zend_Form proposal where each validator would push onto an error array a key word related to the validator. Then that word was used as a key into an error messages array. Sticking with the same example, the validator would push 'digits' or 'between' onto the error array. More details are located here on the "error handling" tab:
http://framework.zend.com/wiki/pages/viewpage.action?pageId=3596

The other solution is to be able to set the error message when defining the validator array for the Zend_Filter_Input.

There are probably other solutions but something is needed.

Another area that hasn't been discussed is validation of multiple fields. For example, validating that a 'password' field and 'password_confirm' field are equivalent. Another common case would be the case of an optional address. In this case either all the fields are supplied (address, city, state, zip) or none.

The getInvalid() method returns an array keyed by the names of fields that failed validation. The value is in turn an array of one or more error messages generated by the validator chain. So you do have access to multiple validation error messages. In other words, you could get the following as the return value from getInvalid():

I agree that it would be more useful to get the names of the validators that failed, not the error strings. I'll see what I can do.

For validating multiple fields, one could implement a custom validator that takes an array argument and returns true if all the values in the array are equal. That would enable the password/password_confirm use case.

If you have multiple fields that are required, declare them all using 'presence'=>'required'

Another solution would be some way to override the default error message of the Zend_Validate_XXX classes through an optional constructor parameter. This way the developer can specify the specific error message when declaring the validator array. Personally I like this solution better than obtaining the validator's name because I don't need to create an error message array an perform a lookup. I can use the invalid array returned by Zend_Filter_Input.

I was curious:

"This component will return filtered and validated field values in escaped format using accessors."

Is it a good idea to use an escape function by default? I can understand setting a few default filters like trim(), but having a html-specific escaper as a default doesn't make much sense unless it's a conscious user choice.

We already have Zend_View knocking around with the exact same functionality in the output layer, and I don't see a need for this to appear once again in the least expected place - the input layer. Honestly, how many people are going to need escaped input in a Controller? Until it becomes output early escaping is only begging for users to become confused.

No, actually, escaping output by default is one of the primary requirements for Zend_Filter_Input! The goal is to make it very easy to avoid output of unsafe values.

The old implementation of Zend_Filter_Input actually overwrote values in $_GET and $_POST, so that you couldn't get them in non-escaped format at all. I thought that overwriting superglobal data was highly inappropriate, and it also broke Zend_Controller's usage of superglobals.

So the proposed redesign does not alter values in superglobals. Instead it provides methods getEscaped() and getUnescaped() method to make it easy for users to get the format that is most appropriate.

Because Zend Framework is primarily designed for developing web applications, it's reasonable that the default escape filter is for HTML output. You can set the escaping filter to something different (like StringTrim) very easily, or else you can get unescaped values with the getUnescaped() method.

I still don't get it to be honest. It just seems wrong to be escaping input by default and forcing everyone to jump through hoops to get hold of the validated input in a format suitable for writing to a database, file, log, external web service or any other of a hundred different reasons for it being in a Controller - not a View.

I'm all for flexibility and adding safety nets but this one is running far outside it's range of a View, taking up the simple fluid interface with cumbersome getters, and requiring more decision making on when and where specific inputs were (or were not) escaped by my mad horde of web designers with minimal PHP skills. The poor souls...

The old implementation of Zend_Filter_Input actually overwrote values in $_GET and $_POST, so that you couldn't get them in non-escaped format at all. I thought that overwriting superglobal data was highly inappropriate, and it also broke Zend_Controller's usage of superglobals.

The old class didn't commit any escaping. Not here to argue the superglobal issue since it (unfortunately) plays havoc across the board even unto encapsulated applications.

Because Zend Framework is primarily designed for developing web applications, it's reasonable that the default escape filter is for HTML output.

...once it's identified as output, which is questionable until it's passed through the Controller and been assigned to a View (or explicitly outputted instead of View assignment). Before that determination is made, I have to fill my code with the horror of getUnescaped() references . Or override the behaviour when constructing the class. I just think it's a shame one has to do all those little upfront tweaks before getting the class to perform its sole responsibility - filtering/validating to clean input which awaits a context for further manipulation.

I would think that it is a good feature that an unescaped value is retrieved with a "cumbersome" function call like getUnescaped() – so anyone reading your app code wakes up and says, "wow it's not escaped! I need to be careful with that value."

In other words, it is by design that it's less convenient to get the unescaped value. But only slightly less convenient – it's still just a single function call. The difference we're discussing is between $input->month and $input->getUnescaped('month').

Anyway, I think I have a solution. I have modified the Zend_Filter_Input code and I will commit it in the incubator shortly. Now a protected array $_defaults stores the name of the default escape filter class.

You can extend Zend_Filter_Input and override the default escaper class. Then all instances your subclass will use StringTrim (or whatever you define) as the default escape filter.

You can do this either by redefining the $defaults array as shown above, or by setting an individual value in the constructor and then calling parent::_construct(), as shown below:

I have committed a significant update to the Zend_Filter_Input class in the incubator, and also unit tests and documentation.

Please update to revision 4776 and take a look!

One could also override the __get() method to point to the getUnescaped() function. That's really the only behaviour I don't agree with.

I'll probably spend the rest of my days bemoaning escaping being present in an input validate/filter chain but I can live with the class extending. I still think it's bizarrely inconsistent when there's two parts of the application with overlapping responsibilities.

I would think that it is a good feature that an unescaped value is retrieved with a "cumbersome" function call like getUnescaped() - so anyone reading your app code wakes up and says, "wow it's not escaped! I need to be careful with that value."

I'd prefer a cumbersome getRaw()/getUnvalidated() to a cumbersome getUnescaped() . Any plans to add such or is that being left completely to the Superglobals? I don't really object too much - I can always check the source code with a regex before battering the developers over the head for using superglobals in the wrong way or subclass something in...just as easily as I can check for any View references to $this/$view not followed somewhere in the fluid interface by "->escape(".

Now I just need to jam a translation object in somewhere and I'm in business...

Appreciate the work!!!

I appreciate your feedback, Pádraic.

I see your point that it's a bit weird to have two classes do similar things, but Andi always said that there should be very loose coupling between Zend Framework classes, and we should architect things with little assumption that all the components are being used together. E.g. someone might use Zend_Filter_Input but not use Zend_View. And vice versa.

On the other hand, Andi also discourages solutions that provide multiple ways of doing the same thing.

I'm not sure how to reconcile those two priorities. I guess in this case, since it's related to application security, I'd rather make sure that there's some way of getting escaped values conveniently, in cases where a developer is using one component but not the other.

Regarding getRaw(), I considered this, but I realized that there's no need, since this new Zend_Filter_Input class doesn't change data in the superglobals. If one wants to get the raw input data, it's still available. There would be little value provided by a getRaw() method, and its presence would probably cause confusion. That is, how could we make it clear how to decide between using $input->getRaw() versus accessing $_POST directly?

I have read through the documentation and I like what is being done. I think this is a valuable contribution to the framework. These fundimental components not only help to simplify and structure my code, but further my understanding of what should be included in a properly developed php application. Without Zend_Filter_Input, Zend_Filter and Zend_Validate make sense, but left me wondering how to structure them group or structure them in my code. Zend_Filter_Input seems to bring them together nicely while providing a simpler and more complete solution for reporting invalid data. I shall see if that is true as I try to impliment it.

I found the following typo in the documentation.

"1.1.3.2. Getting Valid Fields

All fields that are neither invalid, missing, nor unknown are considered valid. You can get values for valid fields a magic accessor."

Maybe you are missing 'using' or 'utilizing' before 'a magic accessor.' I'll leave that up to you. Otherwise it seemed good to me.

This class is never going to get off the ground until the developer is able to override the Zend_Validate_? default error message. I believe that this is a fundamental flaw in the Zend_Validate_? class and should be resolved. With the easiest and cleanest change adding a conditional parameter in the constructor.

It's also imparative that the developer be able to specify the "required" error message generated in Zend_Filter_Input.

Then there's the problem of validating multiple fields (password = password_confirm). What if we had a meta command 'multi' that would cause the Zend_Filter_Input to send the Zend_Validate_xxx the entire array of data to validate?

I'm throwing out ideas to try to get some discussion on this topic. I think it's pretty important to the framework.

Thoughts?

I don't think the current getMessage() errors could ever be used. The simple reason is that people want to control what their pages present to the end-user, and that rarely allows for a framework making such decisions. It has to be a String parameter when setting up a validation chain, mainly to allow for folk to pass a translated string as a parameter.

I'm actually not sure messages should be handled by the Rule classes, it seems like a ta