ZF-2956: Consider the rel-attribute in getLinks

Issue Type: Improvement Created: 2008-03-24T14:11:33.000+0000 Last Updated: 2008-09-02T10:39:47.000+0000 Status: Resolved Fix version(s): - 1.6.0 (02/Sep/08)

Reporter: Daniel Hartmann (danielmitd) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene

Related issues: Attachments: - link-rel.patch


It would be nice if the Zend_Search_Lucene_Document_Html would use the rel-attribute of links. The getLinks method no fetches all links of a document.


Index: Search/Lucene/Document/Html.php

--- Search/Lucene/Document/Html.php (revision 9039) +++ Search/Lucene/Document/Html.php (working copy) @@ -105,7 +105,7 @@

     $linkNodes = $this->_doc->getElementsByTagName('a');
     foreach ($linkNodes as $linkNode) {
  • if (($href = $linkNode->getAttribute('href')) != '') { + if (($href = $linkNode->getAttribute('href')) != '' && $linkNode->getAttribute('rel') != 'nofollow' ) { $this->_links[] = $href; } }


Posted by Wil Sinclair (wil) on 2008-03-25T20:20:58.000+0000

Please categorize/fix as needed.

Posted by Alexander Veremyev (alexander) on 2008-07-26T04:24:41.000+0000


I don't think it's good idea to have this behavior as default since 'nofollow' initiative is not a W3C standard. But it's really useful to have such option.

links with 'nofollow' rel attribute can be excluded now using the following code:

<pre class="highlight">
$doc = Zend_Search_Lucene_Document_Html::loadHTML($html);

This functionality is merged into 1.5 and 1.6 release branches. So it will be included into ZF 1.5.3 and ZF 1.6 (documentation mentions it's only available starting from 1.6, so it's "undocumented" feature for the ZF 1.5.3)

Posted by Wil Sinclair (wil) on 2008-09-02T10:39:47.000+0000

Updating for the 1.6.0 release.

Have you found an issue?

See the Overview section for more details.


© 2006-2018 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.