Issue Type: Bug Created: 2008-09-11T09:30:17.000+0000 Last Updated: 2010-04-18T12:10:24.000+0000 Status: Resolved Fix version(s): - 1.10.4 (28/Apr/10)
Reporter: Nicolas Huguet (nicolas.huguet) Assignee: Alexander Veremyev (alexander) Tags: - Zend_Search_Lucene
Related issues: Attachments: - ZF-4252.patch
When converting an html document to text, the class Zend_Search_Lucene_Document_Html add spaces after each dom node, and so after each html closing tag.
For example, html "ZendFramework" (without space) will be returned as "Zend Framework". Then the search query "ZendFramework" (no space) won't find this document.
Posted by Christopher Thomas (cwt137) on 2010-01-21T06:09:28.000+0000
I have confirmed this issue. Attached is a patch and unit tests.
Posted by Alexander Veremyev (alexander) on 2010-01-21T08:36:51.000+0000
Processing tags without additional space merges several words into one in some cases (e.g. within table markup)
Posted by Christopher Thomas (cwt137) on 2010-03-16T21:13:58.000+0000
This new patch only gets rid of the space if it is an inline tag.
Posted by Alexander Veremyev (alexander) on 2010-04-18T12:10:23.000+0000
Great patch, Christopher! [ZF-8740] is still on my plate :)
Have you found an issue?
See the Overview section for more details.