Skip to end of metadata
Go to start of metadata


<p>Since Zend Framework 1, we've operated to a simple standard. Zend_View offers a basic escape() method using htmlspecialchars() which must be explicitly called by users, with any additional escaping, filtering or validation of inputs entering an output context managed separately by the user. If they get it wrong, it is not the framework's problem.</p>

<p>This approach is arguably poor practice since it assumes that users will know about all the other filtering, escaping and sanitisation that will be required to complement basic HTML escaping. However, it is consistent with prevailing PHP practice which places a heavy emphasis on HTML escaping, usually at the expense of informing programmers about all the other steps needed for a comprehensive anti-XSS strategy.</p>

<p>With the current generation of frameworks, there's been a slight shift towards supporting other means of escaping. Symfony 2 via Twig offers Javascript escaping and the Nette framework supports auto-escaping across multiple contexts. However each has variability and issues in their approach and, as those differences highlight, departing from simple HTML escaping can lead to a lot of weird expectations and edge cases users need to learn anyway, not to mention there is more than HTML and Javascript to escape...</p>

<p>The purpose of this RFC is to propose a new Escaper object (distinct from but usable by Zend\View) which offers a more consistent secure approach. Rather than reinventing the wheel, these escaping methods would comply to current OWASP recommendations with specific documented exceptions to offset the performance cost of going the full stretch. Those exceptions can be reactivated at the cost of some performance loss and a dependency on mbstring.</p>

<p>Essentially, the goal of this Escaper is very simple: to implement contextual escaping based on peer-reviewed rules that carry considerable weight as a recommended practice.</p>

<h2>Moving From Htmlspecialchars() to Contextual Escaping</h2>

<p>To understand why multiple standardised escaping methods are needed, here's a couple of quick points (by no means a complete set!):</p>

<h3>HTML escaping of unquoted HTML attribute values still allows XSS</h3>

<p>This is probably the best known way to defeat htmlspecialchars() when used on attribute values since any space (or character interpreted as a space - there are a lot) lets you inject new attributes whose content can't be neutralised by HTML escaping. The solution (where this is possible) is additional escaping as defined by the OWASP ESAPI codecs. The point here can be extended further - escaping only works if a programmer or designer know what they're doing. In many contexts, there are additional practices and gotchas that need to be carefully monitored since escaping sometimes needs a little extra help to protect against XSS - even if that means ensuring all attribute values are properly double quoted despite this not being required for valid HTML.</p>

<h3>HTML escaping of CSS, Javascript or URIs is often reversed when passed to non-HTML interpreters by the browser.</h3>

<p>HTML escaping is just that - it's designed to escape a string for HTML (i.e. prevent tag or attribute insertion) but not alter the underlying meaning of the content whether it be Text, Javascript, CSS or URIs. For that purpose a fully HTML escaped version of any other context may still have its unescaped form extracted before it's interpreted or executed. For this reason we need separate escapers for Javascript, CSS and URIs and those writing templates MUST know which escaper to apply to which context. Of course this means you need to be able to identify the correct context before selecting the right escaper!</p>

<h3>DOM based XSS requires a defence using at least two levels of different escaping in many cases.</h3>

<p>DOM based XSS has become increasingly common as Javascript has taken off in popularity for large scale client side coding. A simple example is Javascript defined in a template which inserts a new piece of HTML text into the DOM. If the string is only HTML escaped, it may still contain Javascript that will execute in that context. If the string is only Javascript escaped, it may contain HTML markup (new tags and attributes) which will be injected into the DOM and parsed once the inserting Javascript executes. Damned either way? The solution is to escape twice - first escape the string for HTML (make it safe for DOM insertion), and then for Javascript (make it safe for the current Javascript context). Nested contexts are a common means of bypassing naïve escaping habits (e.g. you can inject Javascript into a CSS expression within a HTML Attribute).</p>

<h3>PHP has no known anti-XSS escape functions (only those kidnapped from their original purposes).</h3>

<p>A simple example, widely used, is when you see json_encode() used to escape Javascript, or worse, some kind of mutant addslashes() implementation. These were never designed to eliminate XSS yet PHP programmers use them as such. For example, json_encode() does not escape the ampersand or semi-colon characters by default. That means you can easily inject HTML entities which could then be decoded before the Javascript is evaluated in a HTML document. This lets you break out of strings, add new JS statements, close tags, etc. In other words, using json_encode() is insufficient and naïve. The same, arguably, could be said for htmlspecialchars() which has its own well known limitations that make a singular reliance on it a questionable practice.</p>

<h3>Of those sources checked, there is little agreement over escaping strategies beyond HTML.</h3>

<p>Disagreement over escaping is rife in PHP with varying methods and approaches used. The lack of consistency creates a lot of confusion with users, a situation not helped by the absence of good educational help in the mainstream sources programmers turn to. This confusion is played out over mediums like StackOverflow where some outlandish escaping strategies are often suggested. Other contributing factors include the over-emphasis on HTML escaping we have in PHP and a failure on our part to openly note the many problems in current escaping practice when discussing XSS.</p>

<p>While these five points are hardly comprehensive given the broad scope of the XSS topic, it does offer a bit of insight into the problems inherent in a framework where the only escaping method offered is htmlspecialchars().</p>

<h2>Zend\View\Escaper: Implementing Contextual Escapers</h2>

<p>This RFC proposed a new class called Zend\View\Escaper whose purpose is to offer a collecton of escaper methods which comply with a character encoding set via the constructor. The new functions, for the purposes of this RFC and pending discussion, are as follows:</p>


<p>The first parameter for all functions is required and represents a string to be escaped. Many functions have a second parameter which activates an alternative escaping strategy. These alternative strategies will be designed to follow the strict rules offered by the OWASP ESAPI library and, solely in the case of CSS, activate a sanitisation filter which I'll port from the Wibble HTML Sanitiser prototype. As the alternative strategies are opt-in, this should offset any performance concerns that will be raised as the default operation of each escaper will follow a minimal set of rules that are still robust enough to be used by careful users.</p>

<p>For discussion, a shorthand API could be proposed making use of a single escaper method. However this approach will be very confusing since each escaper and its options vary a great deal. It may however be useful to explore in the context of having a shorter API in some circumstances, e.g. where DOM based XSS requires nested escaper calls to offer full protection.</p>

<h2>Documentation Improvements: The Scare Factor!</h2>

<p>The greatest vulnerability to XSS stares at you each time you look in a mirror. Without sufficient knowledge of XSS, programmers will continue to make the same mistakes, complaints, and wrongful assumptions. If this RFC is accepted, presenting the Escaper in documentation will be recommended as having strong links to the Security section of the manual and some minimal set of inline examples showing what makes each escaper method effective in its given context. The Security section of the manual should contain a more detailed introduction to XSS to educate users and show off the effectiveness of the Escaper in various scenarios.</p>

<h2>Optional Performance Costs: Using the Alternative Strict Escaping Strategies</h2>

<p>As noted above, each escaper has an alternative mode of operation which, at a performance cost, is even more secure than the default strategy. Nearly all the alternative strategies will require mbstring otherwise we would need to implement a string parser and encoding conversion library which would hurt performance even more. Here's an outline of what these alternative strategies are:</p>

<li>escapeHtml(string $string [, $strict = false])</li>

<p>For the standard HTML escaper, setting $strict to true will additionally escape the forward slash character in line with the OWASP recommendations.</p>

<li>escapeHtmlAttr(string $string [, $strict = false])</li>

<p>For the HTML Attribute escaper, setting $strict to true will escape all non-alphanumeric characters with an ASCII (ord) value of less than 256 to hexidecimal entities and replace all non-printable characters except \n and \r with a Unicode replacement character (<br class="atl-forced-newline" />ufffd). Again, this is in line with the OWASP recommendations.</p>

<li>escapeJs(string $string [, $prefilter = null [, $prefilterOpt = false]])</li>

<p>For the Javascript escaper, setting $prefilter to one of "html" or "uri" will first escape the data using the relevant escaper before applying Javascript escaping. For security reasons, this escaper has no performance saving built in and applies strict OWASP recommended escaping by default. It does not rely on json_encode() due to that function's automatic addition of double quotes and the gaps in its escaping coverage which make it weak on XSS prevention. The third parameter if set to true will activate the alternative strategy of the prefilter escaper used. For reasons explained later, this prefiltering is purely for convenience since it's a common use case.</p>

<li>escapeCss(string $string [, $sanitise = false])</li>

<p>For the CSS escaper, setting $sanitise to true will result in the escaper attempting to sanitise the CSS string before escaping it. As for Javascript escaping, the actual escaping routine itself is based on the strict OWASP recommendations and has no performance saving opportunities available since PHP has no relevant function to utilise. Note that sanitisation is a tricky affair so this carries a continual risk of suffering from an exploitable flaw when relied upon and chances are this will happen more than once due to ZF's popularity. Fair warning given!</p>

<li>escapeUri(string $string [, array $insertions = null [, $sanitise = false]])</li>

<p>The method signature above differs from the rest since URL escaping should only ever be applied to data being inserted into a URL, i.e. you never ever URL escape an entire URL! Passing the required first parameter will escape that string under the assumption you will insert it into a hardcoded URL. The second parameter accepts an array of URL parts which will be escaped and inserted into $string using a call to sprintf(). The third parameter if set to true attempts to sanitise completed URLs only (e.g. validate the final result to ensure it's not a data: or javascript: URI among other things).</p>

<p>This is still imperfect - completed URLs should then still be HTML escaped so you could have: $this->escapeHtmlAttr($this->escapeUri(...)). This brings us to the upsetting topic of nested contexts. escapeJs() allows for one level of nesting as a convenience, but it doesn't end there... You could have, for example, CSS embedded in a HTML Attribute which contains a Javascript expression which meddles with the DOM (e.g HTML). That's a heck of a lot of nesting and the escape calls for something like that would NOT be pretty! It's advisable to avoid over-nesting situations unless it's absolutely necessary.</p>

<h2>Other Security and Performance Concerns</h2>

<p>There's a well known side to Human nature in that if a security measure performs too poorly or the inconvenience of a failure too great, programmers will find a way not to implement it. I've lost count, for example, of the number of libraries which deliberately disable SSL peer verification. This RFC therefore stays on the side of making performance-harming features optional even at the cost of a little less security. In many respects, PHP programmers have been spoiled by years of misinterpreting basic PHP functions (which are fast) as making a good XSS defenses because the alternatives are often slower. I don't expect this to change overnight but we should be able to state this up front anyway so programmers understand that escaping isn't invulnerable in an all circumstances.</p>

<p>Outside of htmlspecialchars(), character encoding handling in PHP functions of use in escaping are horrendous and limited to ASCII in most respects, e.g. json_encode(). Thus the default implementations for Javascript, CSS and URI escaping are necessarily limited to ASCII compatible character encodings such as UTF-8. Including inconv or mbstring as a dependency may allow this support to be expanded but it may not be the default behaviour. In a similar fashion, without proper character encoding support, the default escapers would be susceptible to variable width encoding and other similar vectors of attack such as character swallowing.</p>

<p>Outside of dependency limitations, all escapers except HTML escapers in their default strategies, require character parsing of strings which will be slower than a native PHP function. This cannot be helped and can't be mitigated in all cases since the native functions are usually inappropriate for XSS oriented escaping despite the wishful thinking of many programmers. That said, suggestions welcome once the prototype is available!</p>

<p>Due to a dependency on iconv or mbstring for certain strategies, you can guess what happens if either of these extensions has unknown bugs which could be taken advantage of in an XSS attack. The main reason for using these extensions is to enable character encoding support on par with htmlspecialchars() and to support faster string replaces in multibyte encodings like UTF-8 which all helps to boost performance.</p>

<p>Finally, it's worth noting that any validation or sanitisation utilised by the Escaper class is a line of last defence against failures in the upstream filtering/validation. We should be clear that these features are NOT a replacement for input processing but exist only to offer a reasonable fallback in the event of programmer error or carelessness. If there are any inevitable complaints about double filtering, they can go in a complaints box which I'll never open.</p>

<h2>Explicit Escaping vs Auto-Escaping vs Contextual Auto Escaping</h2>

<p>After reviewing auto and auto-context escaping across those sources I could locate, I've arrived at the opinion that these approaches still need a lot of work and have a tendency to play to a programmer's assumptions of what makes an application safe from XSS while giving them an excuse to remain ignorant and lazy. I don't think PHP is better off for them existing in their current state and remain suspicious that they encourage nothing more than a false sense of security.</p>

<p>While automatic contextual escaping would be amazing to have, the knowledge and tools needed to build such a solution while maintaining reasonable performance is a non-trivial task beyond our current capabilities. This may be more indicative of the unreasonable kind of performance we want as opposed to the kind of performance such a system would ever be capable of.</p>

<p>As the above two points indicate, there is simply no replacement for programmers educating themselves about XSS.</p>


<p>Here is set of very approachable references on the topic of escaping for XSS</p>

<p><a href=""></a><br />
<a href=""></a><br />
<a href=""></a><br />
<a href="">HTML5 Security Cheatsheet</a></p>

rfc rfc Delete
xss xss Delete
escaping escaping Delete
zf2 zf2 Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Mar 01, 2012

    <p>I think I'd prefer Zend\Escaper with view helpers in Zend\View\Helper</p>

    1. Mar 12, 2012

      <p>@Rob: +1 to that. I'm not too fond of the name ("Escaper" – what are you escaping from? <ac:emoticon ac:name="wink" />) but I think it would make re-using this component a lot easier. E.g. I could see that people with zf1 applications could possibly use it as well.</p>


      <p>I like your proposal and the way you outlined the goal and responsibility of this component.</p>

      <li>Can you invite the nette people to discuss this proposal? I'd be interested in their opinion.</li>
      <li>Would there be any form of auto-escaping in zf2, or would this shift and rely on the developer completely? (No opinion, just asking for clarification.)</li>