ZF-2956: Consider the rel-attribute in getLinks
Description
It would be nice if the Zend_Search_Lucene_Document_Html would use the rel-attribute of links. The getLinks method no fetches all links of a document.
Patch:
Index: Search/Lucene/Document/Html.php
--- Search/Lucene/Document/Html.php (revision 9039) +++ Search/Lucene/Document/Html.php (working copy) @@ -105,7 +105,7 @@
$linkNodes = $this->_doc->getElementsByTagName('a');
foreach ($linkNodes as $linkNode) {
- if (($href = $linkNode->getAttribute('href')) != '') { + if (($href = $linkNode->getAttribute('href')) != '' && $linkNode->getAttribute('rel') != 'nofollow' ) { $this->_links[] = $href; } }
Comments
Posted by Wil Sinclair (wil) on 2008-03-25T20:20:58.000+0000
Please categorize/fix as needed.
Posted by Alexander Veremyev (alexander) on 2008-07-26T04:24:41.000+0000
Done.
I don't think it's good idea to have this behavior as default since 'nofollow' initiative is not a W3C standard. But it's really useful to have such option.
links with 'nofollow' rel attribute can be excluded now using the following code:
This functionality is merged into 1.5 and 1.6 release branches. So it will be included into ZF 1.5.3 and ZF 1.6 (documentation mentions it's only available starting from 1.6, so it's "undocumented" feature for the ZF 1.5.3)
Posted by Wil Sinclair (wil) on 2008-09-02T10:39:47.000+0000
Updating for the 1.6.0 release.