Tag handled incorrectly by Zend Search Lucene

Issue Type: Bug Created: 2011-09-02T14:32:07.000+0000 Last Updated: 2011-09-03T06:14:55.000+0000 Status: Open Fix version(s): Reporter: Bruno Baketaric (laphroaig) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene

Related issues: Attachments:


A HTML fragement like "Foo
Bar" will result in "FooBar" (as one Word) in the Lucene Index. Usually this should be 2 Words.

Patching method Zend_Search_Lucene_Document_Html::_retrieveNodeText() to this solves the issue:

<pre class="highlight">
    private function _retrieveNodeText(DOMNode $node, &$text)
        if ($node->nodeType == XML_TEXT_NODE) {
            $text .= $node->nodeValue;
            if(!in_array($node->parentNode->tagName, $this->_inlineTags)) {
                $text .= ' ';
        } else if ($node->nodeType == XML_ELEMENT_NODE  &&  $node->nodeName != 'script') {
            foreach ($node->childNodes as $childNode) {
                $text .= ' '; // patch
                $this->_retrieveNodeText($childNode, $text);


Posted by Kim Blomqvist (kblomqvist) on 2011-09-03T06:14:55.000+0000

Hello Bruno,

Thanks for the patch to contribute to ZF, but I notice that you don't appear to have a CLA on file, if you do, you should get in touch with Ralph Schindler and ask him to assign you the correct groups so that you can attach patches as an attachment rather than inline, otherwise, you should sign the cla (…) and return it before contributing code, otherwise your contributions may go unused!

Have you found an issue?

See the Overview section for more details.


© 2006-2018 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.