History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: ZF-405
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Alexander Veremyev
Reporter: T.Lechat
Votes: 16
Watchers: 15
Operations

If you were logged in you would be able to see more operations.
Google issue summary
Zend Framework

Empty items array when parsing rss1.0/RDF feed

Created: 29/Sep/06 04:07 AM   Updated: 20/May/09 07:10 PM
Component/s: Zend_Feed
Affects Version/s: 0.1.5, 1.7.3
Fix Version/s: 1.8.1

Time Tracking:
Not Specified

File Attachments: 1. File patch.diff (2 kb)
2. File patch.diff (1 kb)
3. Zip Archive ZendRDF.zip (5 kb)

Issue Links:
Dependency
 
Duplicate
 
Related
 

 Public Fields   Internal Project Management Fields   
Tags:
Participants: Alexander Veremyev, Benjamin Eberlei, Bill Karwin, Daniel Bezruchkin, Dave Liefbroer, Matt Steele, Matthew Turland, Matthew Weier O'Phinney, Matthias Sch., Ronnie Schwartz, Satoru Yoshida, Simone Carletti, T.Lechat, twk and Wil Sinclair
Fix Version Priority: Should Have


 Description  « Hide
When trying to parse an RSS1.0 / RDF feed (rdf namespace), items array is empty.

Example : http://www.php.net/news.rss (-;

Produce :
[title] => PHP: Hypertext Preprocessor
[link] => http://www.php.net/
[description] => The PHP scripting language web site
[items] => Array
(
)

the XML dump of zend_feed is :

<?xml version="1.0" encoding="utf-8"?>
<channel xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:about="http://www.php.net/">
<title>PHP: Hypertext Preprocessor</title>
<link>http://www.php.net/</link>
<description>The PHP scripting language web site</description>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://www.zendcon.com"/>
.../...

<rdf:li rdf:resource="http://www.php.net/archive/index.php"/>
</rdf:Seq>
</items>
</channel>

I did a quick review of Zend_Feed, finding that there is a namespace
registration which seems to be in trouble, but not sure, and it need probaly to switch the item tag of entryRSS class, or add a new entryRDF class.

It's not a matter, but as this the www.php.net feed, it's humoristic (-;

Thanks for all you job.

Thierry



 All   Comments   Work Log   Change History   FishEye   Crucible      Sort Order: Ascending order - Click to sort in descending order
Daniel Bezruchkin - 13/Oct/06 06:43 PM
Extracting this into your zend directory will get RDF feeds to work.

Dave Liefbroer - 30/Oct/06 01:16 AM
The attached zip doesn't work. It makes a major error (blank page). Will investigate on the error.

Dave Liefbroer - 30/Oct/06 01:52 AM
It's because of:
$success = @$doc->loadXML(Zend_Feed::utf8ToUnicodeEntities($string));
in Feed.php

The utf8ToUnicodeEntities function doesn't exist (wrong code version?)

In the previous version it was:
$success = @$doc->loadXML($string);
That works!


Bill Karwin - 13/Nov/06 03:26 PM
Changing fix version to 0.6.0.

Ronnie Schwartz - 17/Nov/06 11:40 AM
I have the exact same issue. Any idea when this will be resolved? I've used PEAR's RSS class, no good. I've used Magpie/simplepie, no good. This one was able to parse all of the new feeds but cannot parse the 1.0 rdf feeds. So it's the best so far!

Simone Carletti - 01/Dec/07 12:41 PM
This bug depends on ZF-26.
RSS 1.0 lists items outside channel node and Zend_Feed actually can't handle this situation.

Rather than fixing the behavior, I would suggest to add a new RDF class, as proposed in the description of this issue.
RSS 1.0 is completely an other branch compared with RSS 2.0.

The main difference between RSS 0.91 branch (created by Dave Winer) and RSS 1.0 branch (managed by RSS-DEV Working Group) is that the latter is RDF based while RDF architecture has been completely removed in RSS 0.91, RSS 0.92, RSS 2.0.

Additionally, I would suggest to add a new class property to return feed type/version.
The following seems to be a list of formats currently supported by Zend feed:

  • Atom 0.3
  • Atom 0.5
  • Atom 1.0
  • RSS 0.91
  • RSS 0.92
  • RSS 2.0
    The following formats should be supported but they are not, right now:
  • RSS 1.0
    Perhaps a new ticket is the better solution for a new proposal, rather than a comment.

Simone Carletti - 01/Dec/07 12:42 PM
I forgot to say that my previous comment has been inspired by http://www.nabble.com/zend-feed-issue--tf4928553s16154.html#a14108105

Matthew Turland - 03/Feb/08 08:24 AM
The only difference between RSS 1.0 and other versions that is related to this issue is that item elements are not contained within the channel element. The attached file patch.diff modifies Zend_Feed_Rss to check for this and also patches the appropriate test in the test suite so that, without the patch to Zend_Feed_Rss, RSS 1.0 feed tests will fail.

Simone Carletti - 07/Feb/08 03:19 PM
Hi Matthew,
I gave a look at the patch you submitted a few days ago.

The following line doesn't really makes sense to me.

$this->assertTrue($feed->count() > 0);

_importRssValid method is an utility method and we cannot assume in advance the file he's going to fetch is not a valid empty feed.
I would create some valid RSS 1.0 unit tests instead.

The other part of the patch, the code fragment that should introduce RSS 1.0 compatibility it's fine, but I think it's incomplete.
Zend_Feed doesn't handle only feed import but it's able to create and edit a feed as well.

Did you think about how an imported RSS 1.0 feed will be printed out?
I assume it would be handled by Zend_Feed_Rss class but this library, as underlined by ZF-44, always returns an RSS 2.0 instance.
It means, an RSS 1.0 come in and an RSS 2.0 come out... I suppose this is not a good workflow.

What do you propose to fix this consequential issue?

For the sake of completeness, I'd like to share an additional though.
http://www.feedparser.org/ is, so far, the best feed parser written in python and probably one of the best feed parsers in the world.
Zend_Feed should probably learn something from this library!


Simone Carletti - 02/Jun/08 05:05 AM
Any news on this feature?
I would suggest to change status to unassigned if work is not in progress.

Benjamin Eberlei - 08/Nov/08 12:53 AM
I am resolving then reoping this bug, since its occupied over a year now.

Please raise your voice Matthew if this a no go by me


Benjamin Eberlei - 08/Nov/08 12:54 AM
Reopened issue

Matthias Sch. - 19/Nov/08 01:10 AM
any news on this bug?
i think its just including the patch?

Matthew Turland - 19/Nov/08 09:21 AM
As far as I'm aware, no conflicting changes have been made to Zend_Feed_Rss since this patch was suggested, so the patch should work. Note that only the portion of the patch for library/Zend/Feed/Rss.php is really needed.

In terms of the portion that patches tests, it may be a better design decision to create an additional supporting method that first calls _importRssValid and then applies a non-empty check, and have all tests with non-empty test data files call that instead of _importRssValid, so that cases where data is expected to be empty can continue to function as normal.

Thoughts anyone?


Wil Sinclair - 19/Dec/08 03:05 PM
Matthew, could you please evaluate the proposed solution and determine what we need to do to get this fixed? According to the votes, there seems to be a lot of interest in this issue.

Matthew Turland - 19/Dec/08 05:37 PM
I've considered Simone's point and have updated my patch accordingly. _importRssValid no longer checks the feed item count in this new patch. Instead, it modifies _importRssValid to return the $feed object it creates to be used by the calling method and modifies the two existing RSS 1.0 test methods to check their respective feed item counts.

I've applied my patch to Zend_Feed_Rss in a current SVN checkout to confirm that it still works. If I run the modified unit tests on the unpatched version of this class file, I get this output:

$ phpunit Zend_Feed_ImportTest tests/Zend/Feed/ImportTest.php 
PHPUnit 3.3.8 by Sebastian Bergmann.

..............FF..........

Time: 3 seconds

There were 2 failures:

1) testRss100Sample1(Zend_Feed_ImportTest)
Failed asserting that <integer:2> matches expected value <integer:0>.

2) testRss100Sample2(Zend_Feed_ImportTest)
Failed asserting that <integer:1> matches expected value <integer:0>.

FAILURES!
Tests: 26, Assertions: 30, Failures: 2.

If I apply the patch and run the modified unit tests again, I get this output:

$ phpunit Zend_Feed_ImportTest tests/Zend/Feed/ImportTest.php 
PHPUnit 3.3.8 by Sebastian Bergmann.

..........................

Time: 2 seconds

OK (26 tests, 30 assertions)

Is this an acceptable solution?


Benjamin Eberlei - 08/Jan/09 04:38 AM
Resolved issue, i have verified and applied Matthews Testcases and Bugfixes. Thanks! Two very old bugs gone now

Satoru Yoshida - 02/Feb/09 06:02 PM
Sorry, not in 1.7.4. I think it may be released in next minor.

twk - 04/May/09 11:06 PM - edited
The problem is reproducable with some feeds like
http://ranking.goo.ne.jp/rss/keyword/keyrank_all1/index.rdf

The source of that feed begins with
<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet href="/rss/user.xsl" type="text/xsl" media="screen" ?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xml:lang="ja">
<channel rdf:about="http://ranking.goo.ne.jp/service/001/">

Zend_Feed_Rss#__wakeup() checks if the feed is rdf or not with the following code
but the firstChild of that feed is "xml-stylesheet" and so it is not treated as rdf.
Please improve the check routine.
if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else { $this->_element = $this->_element->getElementsByTagName('channel')->item(0); }

Quick fix for the client user:
Replace
$feed = Zend_Feed::import($url);
with something like
$string = file_get_contents($url);
$string = str_replace('<?xml-stylesheet href="/rss/user.xsl" type="text/xsl" media="screen" ?>', '', $string); // or whatever between <?xml ?> and <rdf:RDF
$feed = Zend_Feed::importString($string);


twk - 05/May/09 12:56 AM
To fix the problem, replace the following in Zend_Feed_Rss#__wakeup()
// Find the base channel element and create an alias to it.
if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else {
with
// Find the base channel element and create an alias to it.
$rdf = $this->_element->getElementsByTagNameNS('http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'RDF')->item(0);
if ($rdf) { $this->_element = $rdf; } else {

Matthew Weier O'Phinney - 05/May/09 06:04 AM
Assigning to Alex.

Alexander Veremyev - 06/May/09 07:23 AM
Fixed.

Matt Steele - 20/May/09 07:10 PM
Was this added to 1.8.1? I don't see a Zend_Feed_Rdf class...