Issues

ZF-9566: Zend_Feed_Writer_Renderer_Entry_Atom: a (utf-8) non breaking space makes atom feed creation fail

Description

Warning: DOMDocument::loadXML() [domdocument.loadxml]: Entity 'nbsp' not defined in Entity, line: 1 in /home/commander/web/libraries/ZendFramework-1.10.2/library/Zend/Feed/Writer/Renderer/Entry/Atom.php on line 354

That's because Zend_Feed_Writer_Renderer_Entry_Atom::_loadXhtml() attempts to create XML using tidy. Tidy itself substitutes utf-8 non breaking spaces width   entities which in turn are not XML (and not Atom IMHO).

reproduce:


<?php
$this->layout()->disableLayout();
$feed = new Zend_Feed_Writer_Feed();
$feed->setTitle('test');
$feed->setLink('http://www.testtest.org');
$feed->setFeedLink('http://www.testtest.org/feed', 'atom');
$feed->setDescription('test articles');
$feed->setId('http://www.testtest.org/feed/atom/');
$feed->setEncoding('UTF-8');
$feed->addAuthor(array(
    'name'  => 'test',
    'email' => 'test@testtest.org',
    'uri'   => 'http://www.testtest.org',
));
$feed->setDateModified(new Zend_Date());
$entry = $feed->createEntry();
$entry->setTitle('test article');
$entry->setId('http://www.testtest.org/feed/testarticle1id');
$entry->setDateCreated(new Zend_Date());
$entry->setDateModified(new Zend_Date());

//using html_entity_decode to get an utf-8 no breaking space
$htmlSnippet = '

Test'.html_entity_decode(' ', ENT_COMPAT, 'UTF-8').'Content

'; $entry->setContent($htmlSnippet); $entry->setDescription('test Description'); $feed->addEntry($entry); echo $feed->export('atom');

adding 'quote-nbsp' => false to the tidy configuration resolved my problem:



    /**
     * Load a HTML string and attempt to normalise to XML
     */
    protected function _loadXhtml($content)
    {
        $xhtml = '';
        if (FALSE && class_exists('tidy', false)) {
            $tidy = new tidy;
            $config = array(
                'output-xhtml' => true,
                'show-body-only' => true,
                'quote-nbsp' => false
            );
...

But a more robust implementation of _loadXhtml would propably be better. The concept of "attempting] to normalise to XML" seems not to fit in here. The Example above doesn't fail when tidy isn't installed (or used) at all. Let the user take care of passing something valid and throw an Exception on failure or [turn the DOM warning into an Exception and try to handle that (possibly width tidy or a callback), instead of using tidy regardless of what the input is.

Regards, Lasse

Comments

removed a typo

I got exactly the same problem.

My dev machine has no tidy and I created a valid XML string manually, while our production machine has tidy and the attempt to clean the string breaks it.

I found a function for HTML to XML entity replacement here:http://inanimatt.com/php-convert-entities.php

Let the user take care of passing something valid and throw an Exception on failure or turn the DOM warning into an Exception and try to handle that (possibly width tidy or a callback), instead of using tidy regardless of what the input is. I second that.

Resolved in r22054.

I'll investigate a more robust solution in time for ZF 2.0 and experiment with a better cleanup approach. I'll also build in options to disable the Tidy stage (or remove it completely unless warranted/can report on a fail with a consistently helpful message).