Zend Framework

Consider the rel-attribute in getLinks

Details

  • Type: Improvement Improvement
  • Status: Resolved Resolved
  • Priority: Trivial Trivial
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 1.6.0
  • Component/s: Zend_Search_Lucene
  • Labels:
    None
  • Fix Version Priority:
    Must Have

Description

It would be nice if the Zend_Search_Lucene_Document_Html would use the rel-attribute of links. The getLinks method no fetches all links of a document.

Patch:

Index: Search/Lucene/Document/Html.php
===================================================================
— Search/Lucene/Document/Html.php (revision 9039)
+++ Search/Lucene/Document/Html.php (working copy)
@@ -105,7 +105,7 @@

$linkNodes = $this->_doc->getElementsByTagName('a');
foreach ($linkNodes as $linkNode) {
- if (($href = $linkNode->getAttribute('href')) != '') {
+ if (($href = $linkNode->getAttribute('href')) != '' && $linkNode->getAttribute('rel') != 'nofollow' ) { $this->_links[] = $href; }
}

Activity

Hide
Wil Sinclair added a comment -

Please categorize/fix as needed.

Show
Wil Sinclair added a comment - Please categorize/fix as needed.
Hide
Alexander Veremyev added a comment -

Done.

I don't think it's good idea to have this behavior as default since 'nofollow' initiative is not a W3C standard.
But it's really useful to have such option.

links with 'nofollow' rel attribute can be excluded now using the following code:

Zend_Search_Lucene_Document_Html::setExcludeNoFollowLinks(true);
$doc = Zend_Search_Lucene_Document_Html::loadHTML($html);

This functionality is merged into 1.5 and 1.6 release branches. So it will be included into ZF 1.5.3 and ZF 1.6 (documentation mentions it's only available starting from 1.6, so it's "undocumented" feature for the ZF 1.5.3)

Show
Alexander Veremyev added a comment - Done. I don't think it's good idea to have this behavior as default since 'nofollow' initiative is not a W3C standard. But it's really useful to have such option. links with 'nofollow' rel attribute can be excluded now using the following code:
Zend_Search_Lucene_Document_Html::setExcludeNoFollowLinks(true);
$doc = Zend_Search_Lucene_Document_Html::loadHTML($html);
This functionality is merged into 1.5 and 1.6 release branches. So it will be included into ZF 1.5.3 and ZF 1.6 (documentation mentions it's only available starting from 1.6, so it's "undocumented" feature for the ZF 1.5.3)
Hide
Wil Sinclair added a comment -

Updating for the 1.6.0 release.

Show
Wil Sinclair added a comment - Updating for the 1.6.0 release.

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved:

Time Tracking

Estimated:
15m
Original Estimate - 15 minutes
Remaining:
15m
Remaining Estimate - 15 minutes
Logged:
Not Specified
Time Spent - Not Specified