Issues

ZF-405: Empty items array when parsing rss1.0/RDF feed

Description

When trying to parse an RSS1.0 / RDF feed (rdf namespace), items array is empty.

Example : http://www.php.net/news.rss (-;

Produce : [title] => PHP: Hypertext Preprocessor [link] => http://www.php.net/ [description] => The PHP scripting language web site [items] => Array ( )

the XML dump of zend_feed is :

<?xml version="1.0" encoding="utf-8"?> www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:about="http://www.php.net/"> PHP: Hypertext Preprocessorhttp://www.php.net/ The PHP scripting language web sitewww.zendcon.com"/> .../...

        <rdf:li rdf:resource="http://<a rel="nofollow" href="www.php.net/archive/index.php"/">www.php.net/archive/index.php"/</a>>
    </rdf:Seq>
</items>

I did a quick review of Zend_Feed, finding that there is a namespace registration which seems to be in trouble, but not sure, and it need probaly to switch the item tag of entryRSS class, or add a new entryRDF class.

It's not a matter, but as this the www.php.net feed, it's humoristic (-;

Thanks for all you job.

Thierry

Comments

Extracting this into your zend directory will get RDF feeds to work.

The attached zip doesn't work. It makes a major error (blank page). Will investigate on the error.

It's because of: $success = @$doc->loadXML(Zend_Feed::utf8ToUnicodeEntities($string)); in Feed.php

The utf8ToUnicodeEntities function doesn't exist (wrong code version?)

In the previous version it was: $success = @$doc->loadXML($string); That works!

Changing fix version to 0.6.0.

I have the exact same issue. Any idea when this will be resolved? I've used PEAR's RSS class, no good. I've used Magpie/simplepie, no good. This one was able to parse all of the new feeds but cannot parse the 1.0 rdf feeds. So it's the best so far!

This bug depends on ZF-26. RSS 1.0 lists items outside channel node and Zend_Feed actually can't handle this situation.

Rather than fixing the behavior, I would suggest to add a new RDF class, as proposed in the description of this issue. RSS 1.0 is completely an other branch compared with RSS 2.0.

The main difference between RSS 0.91 branch (created by Dave Winer) and RSS 1.0 branch (managed by RSS-DEV Working Group) is that the latter is RDF based while RDF architecture has been completely removed in RSS 0.91, RSS 0.92, RSS 2.0.

Additionally, I would suggest to add a new class property to return feed type/version. The following seems to be a list of formats currently supported by Zend feed: * Atom 0.3 * Atom 0.5 * Atom 1.0 * RSS 0.91 * RSS 0.92 * RSS 2.0 The following formats should be supported but they are not, right now: * RSS 1.0 Perhaps a new ticket is the better solution for a new proposal, rather than a comment.

I forgot to say that my previous comment has been inspired by http://nabble.com/zend-feed-issue--tf4928553s16154…

The only difference between RSS 1.0 and other versions that is related to this issue is that item elements are not contained within the channel element. The attached file patch.diff modifies Zend_Feed_Rss to check for this and also patches the appropriate test in the test suite so that, without the patch to Zend_Feed_Rss, RSS 1.0 feed tests will fail.

Hi Matthew, I gave a look at the patch you submitted a few days ago.

The following line doesn't really makes sense to me.

{quote} $this->assertTrue($feed->count() > 0); {quote}

_importRssValid method is an utility method and we cannot assume in advance the file he's going to fetch is not a valid empty feed. I would create some valid RSS 1.0 unit tests instead.

The other part of the patch, the code fragment that should introduce RSS 1.0 compatibility it's fine, but I think it's incomplete. Zend_Feed doesn't handle only feed import but it's able to create and edit a feed as well.

Did you think about how an imported RSS 1.0 feed will be printed out? I assume it would be handled by Zend_Feed_Rss class but this library, as underlined by ZF-44, always returns an RSS 2.0 instance. It means, an RSS 1.0 come in and an RSS 2.0 come out... I suppose this is not a good workflow.

What do you propose to fix this consequential issue?

For the sake of completeness, I'd like to share an additional though. http://www.feedparser.org/ is, so far, the best feed parser written in python and probably one of the best feed parsers in the world. Zend_Feed should probably learn something from this library! :)

Any news on this feature? I would suggest to change status to unassigned if work is not in progress.

I am resolving then reoping this bug, since its occupied over a year now.

Please raise your voice Matthew if this a no go by me :-)

Reopened issue

any news on this bug? i think its just including the patch?

As far as I'm aware, no conflicting changes have been made to Zend_Feed_Rss since this patch was suggested, so the patch should work. Note that only the portion of the patch for library/Zend/Feed/Rss.php is really needed.

In terms of the portion that patches tests, it may be a better design decision to create an additional supporting method that first calls _importRssValid and then applies a non-empty check, and have all tests with non-empty test data files call that instead of _importRssValid, so that cases where data is expected to be empty can continue to function as normal.

Thoughts anyone?

Matthew, could you please evaluate the proposed solution and determine what we need to do to get this fixed? According to the votes, there seems to be a lot of interest in this issue.

I've considered Simone's point and have updated my patch accordingly. _importRssValid no longer checks the feed item count in this new patch. Instead, it modifies _importRssValid to return the $feed object it creates to be used by the calling method and modifies the two existing RSS 1.0 test methods to check their respective feed item counts.

I've applied my patch to Zend_Feed_Rss in a current SVN checkout to confirm that it still works. If I run the modified unit tests on the unpatched version of this class file, I get this output:

$ phpunit Zend_Feed_ImportTest tests/Zend/Feed/ImportTest.php 
PHPUnit 3.3.8 by Sebastian Bergmann.

..............FF..........

Time: 3 seconds

There were 2 failures:

1) testRss100Sample1(Zend_Feed_ImportTest)
Failed asserting that 

Resolved issue, i have verified and applied Matthews Testcases and Bugfixes. Thanks! Two very old bugs gone now :-)

Sorry, not in 1.7.4. I think it may be released in next minor.

The problem is reproducable with some feeds like http://ranking.goo.ne.jp/rss/keyword/…

The source of that feed begins with <?xml version="1.0" encoding="utf-8" ?> <?xml-stylesheet href="/rss/user.xsl" type="text/xsl" media="screen" ?> www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="ja">

Zend_Feed_Rss#__wakeup() checks if the feed is rdf or not with the following code but the firstChild of that feed is "xml-stylesheet" and so it is not treated as rdf. Please improve the check routine. if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else { $this->_element = $this->_element->getElementsByTagName('channel')->item(0); }

Quick fix for the client user: Replace $feed = Zend_Feed::import($url); with something like $string = file_get_contents($url); $string = str_replace('<?xml-stylesheet href="/rss/user.xsl" type="text/xsl" media="screen" ?>', '', $string); // or whatever between <?xml ?> and <rdf:RDF $feed = Zend_Feed::importString($string);

To fix the problem, replace the following in Zend_Feed_Rss#__wakeup() // Find the base channel element and create an alias to it. if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else { with // Find the base channel element and create an alias to it. $rdf = $this->_element->getElementsByTagNameNS('http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'RDF')->item(0); if ($rdf) { $this->_element = $rdf; } else {

Assigning to Alex.

Fixed.

Was this added to 1.8.1? I don't see a Zend_Feed_Rdf class...

I don't see this resolved :( A feed which was linked in ZF-6516 is not accessible, neither this one from a german computer-magazine: http://www.heise.de/newsticker/heise.rdf

Sorry, please forget my last comment - I used an old version of ZF... shame on me...