ZF-4252: Zend_Search_Lucene_Document_Html add spaces for html tags

Issue Type: Bug Created: 2008-09-11T09:30:17.000+0000 Last Updated: 2010-04-18T12:10:24.000+0000 Status: Resolved Fix version(s): - 1.10.4 (28/Apr/10)

Reporter: Nicolas Huguet (nicolas.huguet) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene

Related issues: Attachments: - ZF-4252.patch


When converting an html document to text, the class Zend_Search_Lucene_Document_Html add spaces after each dom node, and so after each html closing tag.

For example, html "ZendFramework" (without space) will be returned as "Zend Framework". Then the search query "ZendFramework" (no space) won't find this document.


Posted by Christopher Thomas (cwt137) on 2010-01-21T06:09:28.000+0000

I have confirmed this issue. Attached is a patch and unit tests.

Posted by Alexander Veremyev (alexander) on 2010-01-21T08:36:51.000+0000

Processing tags without additional space merges several words into one in some cases (e.g. within table markup)

Posted by Christopher Thomas (cwt137) on 2010-03-16T21:13:58.000+0000

This new patch only gets rid of the space if it is an inline tag.

Posted by Alexander Veremyev (alexander) on 2010-04-18T12:10:23.000+0000


Great patch, Christopher! [ZF-8740] is still on my plate :)

Have you found an issue?

See the Overview section for more details.


© 2006-2018 by Zend, a Rogue Wave Company. Made with by awesome contributors.

This website is built using zend-expressive and it runs on PHP 7.