ZF-8715: Inconsistent encoding across several view helpers

Description

Zend_Dojo's _renderLayers() method includes a call to htmlentities(). Unfortunately, it does not pull the encoding from the view object, which means it will only work on ASCII characters, which can potentially open multibyte XSS vectors.

Zend_View_Helper_Placeholder_Container, line 29, hardcodes the encoding (instead of using the view object's), and should likely use htmlspecialchars() instead.

Zend_Form_Decorator_HtmlTag, Zend_Service_Twitter, Zend_Feed_Element, and Zend_View_Helper_Navigation_Sitemap hardcode htmlspecialchars() calls to use UTF-8.

Zend_Log_Formatter_Xml, Zend_Tag_Cloud_Decorator_HtmlTag, Zend_Tag_Cloud_Decorator_HtmlCloud, and Zend_View_Helper_HeadStyle do not pass encoding information at all.

Zend_Filter_HtmlEntities defaults to ISO-8859-1, but should default to UTF-8 (same applies to Zend_View). Additionally, for consistency, it should implement a setEncoding() method that proxies to setCharset() (or vice-versa).

Basically, all instances of htmlentities() and/or htmlspecialchars() should use the encoding argument (3rd parameter), defaulting to UTF-8 if no encoding is known.

Comments

Updated subject and description to be comprehensive of all reported components.

Actually, Zend_View_Helper_Placeholder_Container_Standalone does use the view by default; only in the absence of a view object does it call htmlentities with a hard-coded encoding. I think in this case, it fits fine.

Zend_Service_Twitter hardcodes the encoding due to expectations of the Twitter API, and, as such, is correct in the current implementation.

All code updated in both trunk and 1.9 release branch.

Added Zend_Form to list of components affected

Hint: r20104 could be seen as BC break for Zend_Filter_HtmlEntities.

Problematic changes: 1.) Encoding defaults changed 2.) Protected members changed

This could lead to problems with existing classes extending this filter. Therefor a migration note should be added to the 1.10 migration chapter.

As the change has also be backported to 1.9 series the migration note should be added to the 1.9 migration chapter.

The encoding issues are security related, so as such, necessary. The changes from CharSet to Encoding are more consistency, however -- but it made sense to deal with those at the same time.

I'll update the migration chapter in the morning prior to packaging the release.

As for the protected member changes, the change should be transparent due to how the setters were written and the use of the new accessors for retrieving the values (instead of using the protected members directly); it's unlikely developers are overriding the setters/accessors in most cases anyways. I'll still make a note, however.