The Problem
I've been trying to write a generalized feed reader using Zend_Feed and I just stumbled upon a usability issue.
IMHO, it should be a design goal for Zend_Feed to be able to easily consume any common feed out there. However, I have seen quite a few feeds out there that currently can't be handled in a generalized way - that is, without writing tons of if/else statements to filter out special cases.
Example 1
Here's an example feed that doesn't work too well with Zend_Feed:
http://toyflish.de/service/feed.php
When you iterate through the feed items and try to access $item->title() or $item->description(), what you'll get instead of the expected string is an array with two DOMElement objects. The reason being, that there are two of these tags in each item: One with the namespace prefix "media" and one without.
The problem is: If you don't target a specific feed, you don't know in which of these items the relevant information is. You would have to check each item and see whether or not it's empty. If both have content, you would have to check which one is the media node and which is the standard one.
Example 2
The above feed is not the only one causing problems. Try http://feeds.feedburner.com/Techcrunch
, for example, and try to get the
feed link:
Again, what you will get is an array containing two DOMElement objects.
Example 3
Another one that causes problems: http://www.planet-php.net/atom
Problem here is that when you try to get the link for an entry, an empty string is returned:
The items do have links, though, but the link tag itself does not contain any text:
Instead, the link is within the href attribute - the feed reader doesn't seem to check for that, which is yet another thing that I would expect a feed reader class to do.
Solution Proposal
Personally, I would expect Zend_Feed to handle these cases: If there's more than one node, always return the first non-empty standard node (read: not namespaced), if that's empty, return the first non-empty namespaced node (like "media:description"). If the node is an empty link node, the href attribute should be checked.
If you're targeting a specific feed, you could work around the auto-detection by giving the desired namespace as a parameter, and
probably another parameter to toggle the empty-node auto-skipping:
When following the above API proposal, I would opt for making the first parameter 'true' by default, so you'll have an easy, intuitive API that should work in at least 90% of all cases.
Current workaround
Using this function for outputting Zend_Feed results, I can work around the most common issues:
Usage:
This successfully works around the problems in example 1 and 2, but is, of course, far from elegant. The problem described in example 3, however, cannot be worked around without extending or altering Zend_Feed itself.
Other resources
For user discussion, look at the thread "Beginner demo - feed reader" in the ZF-General mailinglist.
For an example of how simple the API should be usable, you may want to look at:
http://pear.php.net/manual/en/package.xml.xml-feed-parser.intro.php
Further suggestions
As seen in the feed reader class from PEAR, it may also be a good idea to be able to turn off strict XML validation and/or offer the option to repair the feed using tidy before parsing. I have also come across feeds that are perfectly usable with most feed readers, but contain some oddities that libxml2 considers invalid XML. Currently, feeds like these are not consumable by Zend_Feed. Example feed:
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fhi.baidu.com%2Fxfxnet2007%2Frss
Even perfectly valid feeds seem to cause problems sometimes - a feed that does not throw an exception, but also seemingly doesn't contain any items when parsed with Zend_Feed is:
http://www.bundestag.de/aktuell/RSS/Bundestag_Presse.rss