Issues

ZF-405: Empty items array when parsing rss1.0/RDF feed

Issue Type: Bug Created: 2006-09-29T04:07:26.000+0000 Last Updated: 2010-03-21T04:51:37.000+0000 Status: Resolved Fix version(s): - 1.8.1 (12/May/09)

Reporter: T.Lechat (zecat) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Feed

Related issues: - ZF-26

Attachments: - patch.diff

Description

When trying to parse an RSS1.0 / RDF feed (rdf namespace), items array is empty.

Example : http://www.php.net/news.rss (-;

Produce : [title] => PHP: Hypertext Preprocessor [link] => http://www.php.net/ [description] => The PHP scripting language web site [items] => Array ( )

the XML dump of zend_feed is :

www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:about="http://"> PHP: Hypertext Preprocessor The PHP scripting language web sitewww.zendcon.com"/> .../...
        <rdf:li rdf:resource="http://<a rel="nofollow" href="www.php.net/archive/index.php"/">www.php.net/archive/index.php"/</a>>
    </rdf:Seq>
</items>

I did a quick review of Zend_Feed, finding that there is a namespace registration which seems to be in trouble, but not sure, and it need probaly to switch the item tag of entryRSS class, or add a new entryRDF class.

It's not a matter, but as this the <www.php.net> feed, it's humoristic (-;

Thanks for all you job.

Thierry

Comments

Posted by Daniel Bezruchkin (visualimpakt) on 2006-10-13T18:43:13.000+0000

Extracting this into your zend directory will get RDF feeds to work.

Posted by Dave Liefbroer (dliefbroer) on 2006-10-30T01:16:16.000+0000

The attached zip doesn't work. It makes a major error (blank page). Will investigate on the error.

Posted by Dave Liefbroer (dliefbroer) on 2006-10-30T01:52:01.000+0000

It's because of: $success = @$doc->loadXML(Zend_Feed::utf8ToUnicodeEntities($string)); in Feed.php

The utf8ToUnicodeEntities function doesn't exist (wrong code version?)

In the previous version it was: $success = @$doc->loadXML($string); That works!

Posted by Bill Karwin (bkarwin) on 2006-11-13T15:26:52.000+0000

Changing fix version to 0.6.0.

Posted by Ronnie Schwartz (rustybrick) on 2006-11-17T11:40:51.000+0000

I have the exact same issue. Any idea when this will be resolved? I've used PEAR's RSS class, no good. I've used Magpie/simplepie, no good. This one was able to parse all of the new feeds but cannot parse the 1.0 rdf feeds. So it's the best so far!

Posted by Simone Carletti (weppos) on 2007-12-01T12:41:20.000+0000

This bug depends on ZF-26. RSS 1.0 lists items outside channel node and Zend_Feed actually can't handle this situation.

Rather than fixing the behavior, I would suggest to add a new RDF class, as proposed in the description of this issue. RSS 1.0 is completely an other branch compared with RSS 2.0.

The main difference between RSS 0.91 branch (created by Dave Winer) and RSS 1.0 branch (managed by RSS-DEV Working Group) is that the latter is RDF based while RDF architecture has been completely removed in RSS 0.91, RSS 0.92, RSS 2.0.

Additionally, I would suggest to add a new class property to return feed type/version. The following seems to be a list of formats currently supported by Zend feed: * Atom 0.3 * Atom 0.5 * Atom 1.0 * RSS 0.91 * RSS 0.92 * RSS 2.0 The following formats should be supported but they are not, right now: * RSS 1.0 Perhaps a new ticket is the better solution for a new proposal, rather than a comment.

Posted by Simone Carletti (weppos) on 2007-12-01T12:42:19.000+0000

I forgot to say that my previous comment has been inspired by http://nabble.com/zend-feed-issue--tf4928553s16154…

Posted by Matthew Turland (elazar) on 2008-02-03T08:24:38.000+0000

The only difference between RSS 1.0 and other versions that is related to this issue is that item elements are not contained within the channel element. The attached file patch.diff modifies Zend_Feed_Rss to check for this and also patches the appropriate test in the test suite so that, without the patch to Zend_Feed_Rss, RSS 1.0 feed tests will fail.

Posted by Simone Carletti (weppos) on 2008-02-07T15:19:01.000+0000

Hi Matthew, I gave a look at the patch you submitted a few days ago.

The following line doesn't really makes sense to me.

{quote} $this->assertTrue($feed->count() > 0); {quote}

_importRssValid method is an utility method and we cannot assume in advance the file he's going to fetch is not a valid empty feed. I would create some valid RSS 1.0 unit tests instead.

The other part of the patch, the code fragment that should introduce RSS 1.0 compatibility it's fine, but I think it's incomplete. Zend_Feed doesn't handle only feed import but it's able to create and edit a feed as well.

Did you think about how an imported RSS 1.0 feed will be printed out? I assume it would be handled by Zend_Feed_Rss class but this library, as underlined by ZF-44, always returns an RSS 2.0 instance. It means, an RSS 1.0 come in and an RSS 2.0 come out... I suppose this is not a good workflow.

What do you propose to fix this consequential issue?

For the sake of completeness, I'd like to share an additional though. http://www.feedparser.org/ is, so far, the best feed parser written in python and probably one of the best feed parsers in the world. Zend_Feed should probably learn something from this library! :)

Posted by Simone Carletti (weppos) on 2008-06-02T05:05:34.000+0000

Any news on this feature? I would suggest to change status to unassigned if work is not in progress.

Posted by Benjamin Eberlei (beberlei) on 2008-11-08T00:53:58.000+0000

I am resolving then reoping this bug, since its occupied over a year now.

Please raise your voice Matthew if this a no go by me :-)

Posted by Benjamin Eberlei (beberlei) on 2008-11-08T00:54:14.000+0000

Reopened issue

Posted by Matthias Sch. (matthias-sch) on 2008-11-19T01:10:10.000+0000

any news on this bug? i think its just including the patch?

Posted by Matthew Turland (elazar) on 2008-11-19T09:21:20.000+0000

As far as I'm aware, no conflicting changes have been made to Zend_Feed_Rss since this patch was suggested, so the patch should work. Note that only the portion of the patch for library/Zend/Feed/Rss.php is really needed.

In terms of the portion that patches tests, it may be a better design decision to create an additional supporting method that first calls _importRssValid and then applies a non-empty check, and have all tests with non-empty test data files call that instead of _importRssValid, so that cases where data is expected to be empty can continue to function as normal.

Thoughts anyone?

Posted by Wil Sinclair (wil) on 2008-12-19T15:05:12.000+0000

Matthew, could you please evaluate the proposed solution and determine what we need to do to get this fixed? According to the votes, there seems to be a lot of interest in this issue.

Posted by Matthew Turland (elazar) on 2008-12-19T17:37:59.000+0000

I've considered Simone's point and have updated my patch accordingly. _importRssValid no longer checks the feed item count in this new patch. Instead, it modifies _importRssValid to return the $feed object it creates to be used by the calling method and modifies the two existing RSS 1.0 test methods to check their respective feed item counts.

I've applied my patch to Zend_Feed_Rss in a current SVN checkout to confirm that it still works. If I run the modified unit tests on the unpatched version of this class file, I get this output:

<pre class="literal">
$ phpunit Zend_Feed_ImportTest tests/Zend/Feed/ImportTest.php 
PHPUnit 3.3.8 by Sebastian Bergmann.

..............FF..........

Time: 3 seconds

There were 2 failures:

1) testRss100Sample1(Zend_Feed_ImportTest)
Failed asserting that 


<pre class="literal">

Posted by Benjamin Eberlei (beberlei) on 2009-01-08T04:38:51.000+0000

Resolved issue, i have verified and applied Matthews Testcases and Bugfixes. Thanks! Two very old bugs gone now :-)

Posted by old of Satoru Yoshida (yoshida@zend.co.jp) on 2009-02-02T18:02:32.000+0000

Sorry, not in 1.7.4. I think it may be released in next minor.

Posted by twk (twk) on 2009-05-04T23:06:46.000+0000

The problem is reproducable with some feeds like http://ranking.goo.ne.jp/rss/keyword/…

The source of that feed begins with www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="ja">

Zend_Feed_Rss#__wakeup() checks if the feed is rdf or not with the following code but the firstChild of that feed is "xml-stylesheet" and so it is not treated as rdf. Please improve the check routine. if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else { $this->_element = $this->_element->getElementsByTagName('channel')->item(0); }

Quick fix for the client user: Replace $feed = Zend_Feed::import($url); with something like $string = file_get_contents($url); $string = str_replace('', '', $string); // or whatever between and <rdf:RDF $feed = Zend_Feed::importString($string);

Posted by twk (twk) on 2009-05-05T00:56:07.000+0000

To fix the problem, replace the following in Zend_Feed_Rss#__wakeup() // Find the base channel element and create an alias to it. if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else { with // Find the base channel element and create an alias to it. $rdf = $this->_element->getElementsByTagNameNS('http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'RDF')->item(0); if ($rdf) { $this->_element = $rdf; } else {

Posted by Matthew Weier O'Phinney (matthew) on 2009-05-05T06:04:21.000+0000

Assigning to Alex.

Posted by Alexander Veremyev (alexander) on 2009-05-06T07:23:40.000+0000

Fixed.

Posted by Matt Steele (orphum) on 2009-05-20T19:10:15.000+0000

Was this added to 1.8.1? I don't see a Zend_Feed_Rdf class...

Posted by Nico Haase (osterlaus) on 2010-03-18T15:12:18.000+0000

I don't see this resolved :( A feed which was linked in ZF-6516 is not accessible, neither this one from a german computer-magazine: http://www.heise.de/newsticker/heise.rdf

Posted by Nico Haase (osterlaus) on 2010-03-21T04:51:37.000+0000

Sorry, please forget my last comment - I used an old version of ZF... shame on me...

Have you found an issue?

See the Overview section for more details.

Copyright

© 2006-2016 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.

Contacts