Issues

ZF-3013: add Filter HtmlEntityDecode & HtmlEntityEncode (rename HtmlEntities -> HtmlEntityEncode)

Issue Type: New Feature Created: 2008-04-01T13:22:17.000+0000 Last Updated: 2010-12-03T15:21:30.000+0000 Status: Closed Fix version(s): Reporter: Marc Bennewitz (GIATA mbH) (mben) Assignee: Marc Bennewitz (private) (mabe) Tags: - Zend_Filter

Related issues: Attachments:

Description

  • add the filter Zend_Filter_HtmlEntityDecode to decode html entities to plain text
  • rename the filter Zend_Filter_HtmlEntities to Zend_Filter_HtmlEntityEncode for named consistently

add features not supported by php's functions: - decode and encode hex entities - decode and encode decimal entities - selectable if named entities will be used (on encode) - selectable if hex or decimal entities will be used (on encode) - selectable to throw an exception if an entity can't convert to the given encoding or ignore it - use mbstring (if installed) to encode/decode encodings not supported by htmlentities and friends

Comments

Posted by Wil Sinclair (wil) on 2008-04-18T17:11:46.000+0000

Please evaluate and categorize/assign as necessary.

Posted by Wil Sinclair (wil) on 2008-12-30T07:22:02.000+0000

It seems like the proposed change would have to wait for 2.0 for BC reasons. Matthew, please evaluate whether it is something we should do and, if so, when we should do it.

Posted by Thomas Weidner (thomas) on 2009-07-08T08:51:20.000+0000

Waiting for reply from dev-team (ralph) since 26.06.09

Posted by Marc Bennewitz (GIATA mbH) (mben) on 2009-08-20T04:34:58.000+0000

This is a little script to convert a html text to UTF-8: (It is not optimal because it doesn't handle script tags and some other specials)

<pre class="highlight">
// $text = html text
// $enc = encoding of html text
$text = mb_convert_encoding($text, 'UTF-8', $enc);
// convert numeric entities
$pattern = '/\d{2,5};/u';
$text = preg_replace_callback($pattern, '_decEntitiesToUtf8', $text);
// convert hex entities
$pattern = '/([a-f0-9]{2,8});/ui'; // erst in XML muss das "x" klein geschrieben werden
$text = preg_replace_callback($pattern, '_hexEntitiesToUtf8', $text);
// convert named entities (but not ",',<,>,&)
$htmlTransTable = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES);
$xmlTransTable  = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES);
$myTransTable   = array();
foreach (array_diff_assoc($htmlTransTable, $xmlTransTable) as $char => $ent) {
    $myTransTable[$ent] = mb_convert_encoding($char,'UTF-8','ISO-8859-1');
}
$text = strtr($text, $myTransTable);

function _decEntitiesToUtf8($matches) {
    $convmap = array(0x0, 0x10000, 0, 0xfffff);
    $transTable = get_html_translation_table(HTML_SPECIALCHARS, ENT_QUOTES);
    $ret = mb_decode_numericentity($matches[0], $convmap, 'UTF-8');
    if (isset($transTable[$ret])) {
        return $transTable[$ret];
    } else {
        return $ret;
    }
}

function _hexEntitiesToUtf8($matches) {
    $convmap = array(0x0, 0x10000, 0, 0xfffff);
    $transTable = get_html_translation_table(HTML_SPECIALCHARS, ENT_QUOTES);
    $ret = mb_decode_numericentity(hexdec($matches[1]), $convmap, 'UTF-8');
    if (isset($transTable[$ret])) {
        return $transTable[$ret];
    } else {
        return $ret;
    }
}

Posted by Thomas Weidner (thomas) on 2009-08-23T14:31:19.000+0000

I got reply some days ago from the devteam...

No for the renaming from HtmlEntities to HtmlEntityEncode Yes for the new HtmlEntityDecode. No for mbstring because iconv is installed per default but mbstring not.

Posted by Marc Bennewitz (private) (mabe) on 2009-10-28T15:01:47.000+0000

I created a proposal for this: http://framework.zend.com/wiki/pages/…

-> the functionality is different to htmlentities, htmlspecialchars and friends and should not replace these filter(s).

Posted by Raphael Dehousse (thymus) on 2010-02-16T03:22:48.000+0000

A basic Zend_Filter_HtmlEntityDecode:

<pre class="highlight">
<?php

class Zend_Filter_HtmlEntityDecode implements Zend_Filter_Interface
{

    /**
     * Corresponds to the second htmlentities() argument
     *
     * @var integer
     */
    protected $_quoteStyle;

    /**
     * Corresponds to the third htmlentities() argument
     *
     * @var string
     */
    protected $_encoding;

    /**
     * Sets filter options
     *
     * @param  integer|array $quoteStyle
     * @param  string  $charSet
     * @return void
     */
    public function __construct($options = array())
    {
        if (!is_array($options)) {
            $options = func_get_args();
            $temp['quotestyle'] = array_shift($options);
            if (!empty($options)) {
                $temp['charset'] = array_shift($options);
            }

            $options = $temp;
        }

        if (!isset($options['quotestyle'])) {
            $options['quotestyle'] = ENT_COMPAT;
        }

        if (!isset($options['encoding'])) {
            $options['encoding'] = 'UTF-8';
        }
        if (isset($options['charset'])) {
            $options['encoding'] = $options['charset'];
        }

        $this->setQuoteStyle($options['quotestyle']);
        $this->setEncoding($options['encoding']);
    }

    /**
     * Returns the quoteStyle option
     *
     * @return integer
     */
    public function getQuoteStyle()
    {
        return $this->_quoteStyle;
    }

    /**
     * Sets the quoteStyle option
     *
     * @param  integer $quoteStyle
     * @return Zend_Filter_HtmlEntities Provides a fluent interface
     */
    public function setQuoteStyle($quoteStyle)
    {
        $this->_quoteStyle = $quoteStyle;
        return $this;
    }

    /**
     * Get encoding
     *
     * @return string
     */
    public function getEncoding()
    {
         return $this->_encoding;
    }

    /**
     * Set encoding
     *
     * @param  string $value
     * @return Zend_Filter_HtmlEntities
     */
    public function setEncoding($value)
    {
        $this->_encoding = (string) $value;
        return $this;
    }

    /**
     * Returns the charSet option
     *
     * Proxies to {@link getEncoding()}
     *
     * @return string
     */
    public function getCharSet()
    {
        return $this->getEncoding();
    }

    /**
     * Sets the charSet option
     *
     * Proxies to {@link setEncoding()}
     *
     * @param  string $charSet
     * @return Zend_Filter_HtmlEntities Provides a fluent interface
     */
    public function setCharSet($charSet)
    {
        return $this->setEncoding($charSet);
    }

    /**
     * Defined by Zend_Filter_Interface
     *
     * Returns the string $value, converting HTML entity to their corresponding characters
     * equivalents where they exist
     *
     * @param  string $value
     * @return string
     */
    public function filter($value)
    {
        return html_entity_decode((string) $value, $this->getQuoteStyle(), $this->getEncoding());
    }

}

Posted by Marc Bennewitz (private) (mabe) on 2010-12-03T15:21:29.000+0000

not accepted by CR-Team

Have you found an issue?

See the Overview section for more details.

Copyright

© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.

Contacts